Join not in Enumerable

Just a few minutes ago I was playing with irb as I am wont to do, and
typed this:

('a'..'z').join(' ')

Lo and behold it protested at me with a NoMethodError. I said to my
self, self there is no reason that has to be Array only functionality.
Why isn't it in Enumerable? So I said:

module Enumerable
       def join(sep = '')
            inject do |a, b|
                     "#{a}#{sep}#{b}"
            end
        end
end

And then I said ('a'..'z').join(' ') and got:
=> "a b c d e f g h i j k l m n o p q r s t u v w x y z"

#inject has to be the most dangerously effective method ever. But I digress:

Why is join, and perhaps even pack in Array and not in Enumerable?

the only reason i can think of is that just because somthing is countable
(Enumerable) doesn't mean each sub-thing is singular. take a hash for
example. this is no stubling block (pun intended) for ruby however:

   harp:~ > cat a.rb
   module Enumerable
     def join(sep = '', &b)
       inject(nil){|s,x| "#{ s }#{ s && sep }#{ b ? b[ x ] : x }"}
     end
   end
   class Array; def join(*a, &b); super; end; end

   r = 'a' .. 'z'
   p(r.join(' '))

   h = {:k => :v, :K => :V}
   p(h.join(';'){|kv| kv.join '=>'})

   a = [ [0, 1], [2, 3] ]
   p(a.join(','){|kv| kv.join ':'})

   harp:~ > ruby a.rb
   "a b c d e f g h i j k l m n o p q r s t u v w x y z"
   "k=>v;K=>V"
   "0:1,2:3"

this allows 'nesting' of join calls for arbitrarily deep enumerable structures.

   a3 = [
     [ [:a, :b], [:c, :d] ],
     [ [:e, :f], [:g, :h] ],
   ]

   p( a3.join('___'){|a2| a2.join('__'){|a1| a1.join '_'}} )

   #=> "a_b__c_d___e_f__g_h"

it's a nice idea you have there!

cheers.

-a

···

On Sun, 22 May 2005, Logan Capaldo wrote:

Just a few minutes ago I was playing with irb as I am wont to do, and
typed this:

('a'..'z').join(' ')

Lo and behold it protested at me with a NoMethodError. I said to my
self, self there is no reason that has to be Array only functionality.
Why isn't it in Enumerable? So I said:

module Enumerable
      def join(sep = '')
           inject do |a, b|
                    "#{a}#{sep}#{b}"
           end
       end
end

And then I said ('a'..'z').join(' ') and got:
=> "a b c d e f g h i j k l m n o p q r s t u v w x y z"

#inject has to be the most dangerously effective method ever. But I digress:

Why is join, and perhaps even pack in Array and not in Enumerable?

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

Hi --

Just a few minutes ago I was playing with irb as I am wont to do, and
typed this:

('a'..'z').join(' ')

Lo and behold it protested at me with a NoMethodError. I said to my
self, self there is no reason that has to be Array only functionality.
Why isn't it in Enumerable? So I said:

module Enumerable
      def join(sep = '')
           inject do |a, b|
                    "#{a}#{sep}#{b}"
           end
       end
end

And then I said ('a'..'z').join(' ') and got:
=> "a b c d e f g h i j k l m n o p q r s t u v w x y z"

#inject has to be the most dangerously effective method ever. But I digress:

You can speed it up a lot if you do this:

   module Enumerable
     def join(sep = '')
       to_a.join(sep)
     end
   end

Benchmarking 10 calls to each version, for a dummy class where each
just iterates from 1 to 1000:

          user system total real
inject 2.720000 0.030000 2.750000 ( 2.759071)
to_a 0.300000 0.000000 0.300000 ( 0.298650)

Why is join, and perhaps even pack in Array and not in Enumerable?

I guess to_a makes the conversion pretty easy, and Array tends to
serve as the "normalized" version of Enumerable in a lot of contexts.
I don't know if there's any other reason.

David

···

On Sun, 22 May 2005, Logan Capaldo wrote:

--
David A. Black
dblack@wobblini.net

Logan Capaldo wrote:

Just a few minutes ago I was playing with irb as I am wont to do, and
typed this:

('a'..'z').join(' ')

Lo and behold it protested at me with a NoMethodError. I said to my
self, self there is no reason that has to be Array only

functionality.

Why isn't it in Enumerable? So I said:

module Enumerable
       def join(sep = '')
            inject do |a, b|
                     "#{a}#{sep}#{b}"
            end
        end
end

And then I said ('a'..'z').join(' ') and got:
=> "a b c d e f g h i j k l m n o p q r s t u v w x y z"

#inject has to be the most dangerously effective method ever. But I

digress:

Why is join, and perhaps even pack in Array and not in Enumerable?

Because there is nothing explicitly iterative about join. Also, every
class except Array would have to have a custom definition of join,
since there's no reasonable default behavior for any class outside of
Array. And if every class would have to implement its own version of a
method, that method doesn't belong in a module. Modules are not
interfaces.

I can see that Ara has already found an excuse to give join a block -
lovely. The slide continues....

Regards,

Dan

[... elided version using to_a ...]

The reason the non-to_a version is slow is because it creates a series of
increasingly larger strings. A faster version (without resorting to to_a)
would build up a single string gradually. Here is another version:

  def join(sep='')
    inject(nil) { |a, b|
      a ? (a << sep << b.to_s) : "#{b}"
    }
  end

Here are the timings I got ...

                  user system total real
to_a: 0.580000 0.000000 0.580000 ( 0.583975)
inject slow: 10.520000 0.210000 10.730000 ( 11.998484)
inject fast: 0.590000 0.020000 0.610000 ( 0.651972)

···

On Saturday 21 May 2005 10:05 pm, David A. Black wrote:

Hi --

On Sun, 22 May 2005, Logan Capaldo wrote:
> Just a few minutes ago I was playing with irb as I am wont to do, and
> typed this:
>
> ('a'..'z').join(' ')
>
> Lo and behold it protested at me with a NoMethodError. I said to my
> self, self there is no reason that has to be Array only functionality.
> Why isn't it in Enumerable? So I said:
>
> module Enumerable
> def join(sep = '')
> inject do |a, b|
> "#{a}#{sep}#{b}"
> end
> end
> end
>
> And then I said ('a'..'z').join(' ') and got:
> => "a b c d e f g h i j k l m n o p q r s t u v w x y z"
>
> #inject has to be the most dangerously effective method ever. But I
> digress:

You can speed it up a lot if you do this:

--
-- Jim Weirich jim@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

Hi,

At Sun, 22 May 2005 08:56:08 +0900,
Ara.T.Howard wrote in [ruby-talk:143311]:

the only reason i can think of is that just because somthing is countable
(Enumerable) doesn't mean each sub-thing is singular. take a hash for
example. this is no stubling block (pun intended) for ruby however:

Feels interesting.

Index: enum.c

···

===================================================================
RCS file: /cvs/ruby/src/ruby/enum.c,v
retrieving revision 1.54
diff -U2 -p -r1.54 enum.c
--- enum.c 30 Oct 2004 06:56:17 -0000 1.54
+++ enum.c 22 May 2005 09:36:21 -0000
@@ -967,4 +967,52 @@ enum_zip(argc, argv, obj)
}

+static VALUE
+enum_join_s(obj, arg, recur)
+ VALUE obj, *arg;
+ int recur;
+{
+ if (recur) {
+ static const char recursed = "[...]";
+ if (!NIL_P(arg[1]) && RSTRING(arg[0])->len != 0) {
+ rb_str_append(arg[0], arg[1]);
+ }
+ rb_str_cat(arg[0], recursed, sizeof(recursed) - 1);
+ }
+ else {
+ if (rb_block_given_p()) {
+ obj = rb_yield(obj);
+ }
+ if (TYPE(obj) != T_STRING) {
+ obj = rb_obj_as_string(obj);
+ }
+ if (!NIL_P(arg[1]) && RSTRING(arg[0])->len != 0) {
+ rb_str_append(arg[0], arg[1]);
+ }
+ rb_str_append(arg[0], obj);
+ }
+ return arg[0];
+}
+
+static VALUE
+enum_join_i(el, arg)
+ VALUE el, arg;
+{
+ return rb_exec_recursive(enum_join_s, el, arg);
+}
+
+static VALUE
+enum_join(argc, argv, obj)
+ int argc;
+ VALUE *argv;
+ VALUE obj;
+{
+ VALUE arg[2];
+
+ rb_scan_args(argc, argv, "01", &arg[1]);
+ arg[0] = rb_str_new(0, 0);
+ rb_iterate(rb_each, obj, enum_join_i, (VALUE)arg);
+ return arg[0];
+}
+
/*
  * The <code>Enumerable</code> mixin provides collection classes with
@@ -998,4 +1046,5 @@ Init_Enumerable()
     rb_define_method(rb_mEnumerable,"inject", enum_inject, -1);
     rb_define_method(rb_mEnumerable,"partition", enum_partition, 0);
+ rb_define_method(rb_mEnumerable,"classify", enum_classify, 0);
     rb_define_method(rb_mEnumerable,"all?", enum_all, 0);
     rb_define_method(rb_mEnumerable,"any?", enum_any, 0);
@@ -1008,4 +1057,5 @@ Init_Enumerable()
     rb_define_method(rb_mEnumerable,"each_with_index", enum_each_with_index, 0);
     rb_define_method(rb_mEnumerable, "zip", enum_zip, -1);
+ rb_define_method(rb_mEnumerable, "join", enum_join, -1);

     id_eqq = rb_intern("===");

--
Nobu Nakada

I believe because join requires an ordered collection, and enumerables
aren't guaranteed to be ordered. For example the order of traversing a
Hash may differ for a different hash with the same elements. For this
reason the output of join for an enumerable is undefined.

Regards,
KB

···

On Sun, 22 May 2005 11:05:10 +0900, David A. Black wrote:

Why is join, and perhaps even pack in Array and not in Enumerable?

I guess to_a makes the conversion pretty easy, and Array tends to
serve as the "normalized" version of Enumerable in a lot of contexts.
I don't know if there's any other reason.

Logan Capaldo wrote:

Just a few minutes ago I was playing with irb as I am wont to do, and
typed this:

('a'..'z').join(' ')

Lo and behold it protested at me with a NoMethodError. I said to my
self, self there is no reason that has to be Array only

functionality.

Why isn't it in Enumerable? So I said:

module Enumerable
       def join(sep = '')
            inject do |a, b|
                     "#{a}#{sep}#{b}"
            end
        end
end

And then I said ('a'..'z').join(' ') and got:
=> "a b c d e f g h i j k l m n o p q r s t u v w x y z"

#inject has to be the most dangerously effective method ever. But I

digress:

Why is join, and perhaps even pack in Array and not in Enumerable?

Because there is nothing explicitly iterative about join.

   Enumerable#join(sep): concatinate each (Enumerable#each) thing onto a string
   followed by sep, unless it is the last (implying iteration) thing.

isn't this definition reasonable and iterative?

Also, every class except Array would have to have a custom definition of
join, since there's no reasonable default behavior for any class outside of
Array.

really?

   set = Set::new
   set.join ','

   ll = LinkedList::new
   ll.join '->'

   dll = DoublyLinkedList::new
   dll.join '<->'

   v = BitVector::new
   v.join '|'

   path = graph.shortest_path from, to
   path.join '=>'

   string = String::new
   string.join "<br>"

   stack = Stack::new
   stack.join '-'

   rope = Rope::new
   rope.join '_'

   priority_queue = PriorityQueue::new
   priority_queue.join(','){|priority_and_obj| priority_and_obj.join ':'}

come to mind :wink:

And if every class would have to implement its own version of a method, that
method doesn't belong in a module. Modules are not interfaces.

why would every class have to implement it's own? with the defintion we've
been throwing around we already have things like

   harp:~/build/ruby > ./ruby -e'html = "line1\nline2\nline3".join "<br>"; p html'
   "line1\n<br>line2\n<br>line3"

which is kinda handy and makes good sense no?

I can see that Ara has already found an excuse to give join a block -
lovely. The slide continues....

weee. :wink:

cheers.

-a

···

On Mon, 23 May 2005, Daniel Berger wrote:
--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

Jim Weirich schrieb:

def join(sep='')
   inject(nil) { |a, b|
     a ? (a << sep << b.to_s) : "#{b}"
   }
end

It's
p (.join) # ""

so this should be

def join(sep="")
   if sep == ""
      inject('') { |a, b|
        a << b.inspect
      }
    else
      inject('') { |a, b|
        a << sep << b.inspect
      }
   end
end

/Christoph

"Kristof Bastiaensen" <kristof@vleeuwen.org> schrieb im Newsbeitrag news:pan.2005.05.22.09.53.41.174104@vleeuwen.org...

Why is join, and perhaps even pack in Array and not in Enumerable?

I guess to_a makes the conversion pretty easy, and Array tends to
serve as the "normalized" version of Enumerable in a lot of contexts.
I don't know if there's any other reason.

I believe because join requires an ordered collection, and enumerables
aren't guaranteed to be ordered.

That would be my answer, too.

For example the order of traversing a
Hash may differ for a different hash with the same elements. For this
reason the output of join for an enumerable is undefined.

At least it is unpredictable. Even more so: order may change completely with each insertion:

h=(0..5).inject({}){|h,i| h[i.to_s]=i;h}

=> {"0"=>0, "1"=>1, "2"=>2, "3"=>3, "4"=>4, "5"=>5}

h.to_a

=> [["0", 0], ["1", 1], ["2", 2], ["3", 3], ["4", 4], ["5", 5]]

h["6"]=6

=> 6

h.to_a

=> [["6", 6], ["0", 0], ["1", 1], ["2", 2], ["3", 3], ["4", 4], ["5", 5]]

Kind regards

    robert

···

On Sun, 22 May 2005 11:05:10 +0900, David A. Black wrote:

Hi --

···

On Sun, 22 May 2005, Kristof Bastiaensen wrote:

On Sun, 22 May 2005 11:05:10 +0900, David A. Black wrote:

Why is join, and perhaps even pack in Array and not in Enumerable?

I guess to_a makes the conversion pretty easy, and Array tends to
serve as the "normalized" version of Enumerable in a lot of contexts.
I don't know if there's any other reason.

I believe because join requires an ordered collection, and enumerables
aren't guaranteed to be ordered. For example the order of traversing a
Hash may differ for a different hash with the same elements. For this
reason the output of join for an enumerable is undefined.

I don't think the unorderedness would matter; consider, for example,
Hash#to_s.

David

--
David A. Black
dblack@wobblini.net

only you would crank that out in C nobu :wink:

looks good for enumerable:

   harp:~/build/ruby > ./ruby -e' p( {:k => :v, :K => :V }.join(","){|kv| kv.join "=>"} ) '
   "k=>v,K=>V"

but doesn't override Array's current behaviour:

   harp:~/build/ruby > ./ruby -e'a3 = [ [ [4], [2] ], [ ["forty"], ["two"] ] ]; p a3.join("___"){|a2| a2.join("__"){|a1| a1.join "_"}}'
   "4___2___forty___two"

i'm not sure how to do this in C:

   module Enumerable
     def join(sep = '', &b)
       inject(nil){|s,x| "#{ s }#{ s && sep }#{ b ? b[ x ] : x }"}
     end
   end
   class Array
     def join(*a, &b); super; end
   end

so Array's join is clobbered...

kind regards.

-a

···

On Sun, 22 May 2005 nobu.nokada@softhome.net wrote:

Hi,

At Sun, 22 May 2005 08:56:08 +0900,
Ara.T.Howard wrote in [ruby-talk:143311]:

the only reason i can think of is that just because somthing is countable
(Enumerable) doesn't mean each sub-thing is singular. take a hash for
example. this is no stubling block (pun intended) for ruby however:

Feels interesting.

Index: enum.c

RCS file: /cvs/ruby/src/ruby/enum.c,v
retrieving revision 1.54
diff -U2 -p -r1.54 enum.c
--- enum.c 30 Oct 2004 06:56:17 -0000 1.54
+++ enum.c 22 May 2005 09:36:21 -0000
@@ -967,4 +967,52 @@ enum_zip(argc, argv, obj)
}

+static VALUE
+enum_join_s(obj, arg, recur)
+ VALUE obj, *arg;
+ int recur;
+{
+ if (recur) {
+ static const char recursed = "[...]";
+ if (!NIL_P(arg[1]) && RSTRING(arg[0])->len != 0) {
+ rb_str_append(arg[0], arg[1]);
+ }
+ rb_str_cat(arg[0], recursed, sizeof(recursed) - 1);
+ }
+ else {
+ if (rb_block_given_p()) {
+ obj = rb_yield(obj);
+ }
+ if (TYPE(obj) != T_STRING) {
+ obj = rb_obj_as_string(obj);
+ }
+ if (!NIL_P(arg[1]) && RSTRING(arg[0])->len != 0) {
+ rb_str_append(arg[0], arg[1]);
+ }
+ rb_str_append(arg[0], obj);
+ }
+ return arg[0];
+}
+
+static VALUE
+enum_join_i(el, arg)
+ VALUE el, arg;
+{
+ return rb_exec_recursive(enum_join_s, el, arg);
+}
+
+static VALUE
+enum_join(argc, argv, obj)
+ int argc;
+ VALUE *argv;
+ VALUE obj;
+{
+ VALUE arg[2];
+
+ rb_scan_args(argc, argv, "01", &arg[1]);
+ arg[0] = rb_str_new(0, 0);
+ rb_iterate(rb_each, obj, enum_join_i, (VALUE)arg);
+ return arg[0];
+}
+
/*
* The <code>Enumerable</code> mixin provides collection classes with
@@ -998,4 +1046,5 @@ Init_Enumerable()
    rb_define_method(rb_mEnumerable,"inject", enum_inject, -1);
    rb_define_method(rb_mEnumerable,"partition", enum_partition, 0);
+ rb_define_method(rb_mEnumerable,"classify", enum_classify, 0);
    rb_define_method(rb_mEnumerable,"all?", enum_all, 0);
    rb_define_method(rb_mEnumerable,"any?", enum_any, 0);
@@ -1008,4 +1057,5 @@ Init_Enumerable()
    rb_define_method(rb_mEnumerable,"each_with_index", enum_each_with_index, 0);
    rb_define_method(rb_mEnumerable, "zip", enum_zip, -1);
+ rb_define_method(rb_mEnumerable, "join", enum_join, -1);

    id_eqq = rb_intern("===");

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

Ara.T.Howard wrote:

> Also, every class except Array would have to have a custom

definition of

> join, since there's no reasonable default behavior for any class

outside of

> Array.

really?

   set = Set::new
   set.join ','

   ll = LinkedList::new
   ll.join '->'

   dll = DoublyLinkedList::new
   dll.join '<->'

   v = BitVector::new
   v.join '|'

   path = graph.shortest_path from, to
   path.join '=>'

   string = String::new
   string.join "<br>"

   stack = Stack::new
   stack.join '-'

   rope = Rope::new
   rope.join '_'

   priority_queue = PriorityQueue::new
   priority_queue.join(','){|priority_and_obj| priority_and_obj.join

':'}

come to mind :wink:

Fine, replace "Array" with "most lists" and my point still stands.

> And if every class would have to implement its own version of a

method, that

> method doesn't belong in a module. Modules are not interfaces.

why would every class have to implement it's own? with the defintion

we've

been throwing around we already have things like

   harp:~/build/ruby > ./ruby -e'html = "line1\nline2\nline3".join

"<br>"; p html'

   "line1\n<br>line2\n<br>line3"

which is kinda handy and makes good sense no?

No, it doesn't make sense.

Regards,

Dan

Hi --

"Kristof Bastiaensen" <kristof@vleeuwen.org> schrieb im Newsbeitrag news:pan.2005.05.22.09.53.41.174104@vleeuwen.org...

Why is join, and perhaps even pack in Array and not in Enumerable?

I guess to_a makes the conversion pretty easy, and Array tends to
serve as the "normalized" version of Enumerable in a lot of contexts.
I don't know if there's any other reason.

I believe because join requires an ordered collection, and enumerables
aren't guaranteed to be ordered.

That would be my answer, too.

As per my previous post, I don't think that matters for join, which is
just a "dumb" string representation facility and won't care about
order.

Another related thought: Enumerables have this underlying numerical
index, as reflected in Enumerable#each_with_index. Even hashes are,
in that sense, "ordered": their elements are "indexed" from 0 up.

I have to say, though, that I think #each_with_index should be removed
from Enumerable and pushed down to the classes that mix it in
(similarly to #each_index). But I suppose as long as they are called
"enumerable" they are in some sense associated with a numerical index.

That's probably only tangentially related to #join, though. Mainly I
think that #join is just a fancy #to_s, and orderedness isn't an
issue.

David

···

On Sun, 22 May 2005, Robert Klemme wrote:

On Sun, 22 May 2005 11:05:10 +0900, David A. Black wrote:

--
David A. Black
dblack@wobblini.net