Map and join or inject?

hello,

given an array of strings A, i need to map a given function F to each
element of A and concatenate the results.

which way is more memory efficient:

1. A.inject("") { |t,x| t + F(x) }
2. (A.map { |x| F(x) }).join

thanks
konstantin

Hmm I cannot really give you an exact answer but it seems that map/join is
consuming only a third of the objects (but see below for a small
improvement)

Counting objects does not even give you the memory consumption, we do not
know their size.
One could monitor the interpreter of course (sorry lazy day).

However as map/join runs more than 4 times faster I would go for it.
I prefer map/join for it's "functional" elegance too.

Another problem in inject is the "+" if you use "<<" you get down to the
double of objects and run only 3 times slower

Hope that helps
Cheers
Robert

···

On 8/26/06, ako... <akonsu@gmail.com> wrote:

hello,

given an array of strings A, i need to map a given function F to each
element of A and concatenate the results.

which way is more memory efficient:

1. A.inject("") { |t,x| t + F(x) }
2. (A.map { |x| F(x) }).join

thanks
konstantin

--
Deux choses sont infinies : l'univers et la bêtise humaine ; en ce qui
concerne l'univers, je n'en ai pas acquis la certitude absolue.

- Albert Einstein

I just went through some tests with monitoring from Linux.
I guess that these tests still are not very conclusive.
The only thing I am almost sure about is that inject with "+" is a
performance killer.
Inject with "<<" works pretty well and map.join uses lots of memory.
That one I got completely wrong in my tests (and as I am biased against
inject, was happy to accept ).

I could not even test inject+ in a reasonable time, it was also they only
one where GC kicked in.

Here are the memory and time consumptions for what it is worth at least one
sees the tradeoff between time and memory.

map/join 37.6 MU 7TU
each<< 8.1 MU 29TU
inject<< 8.4 MU 84TU ( I might have screwed up again here)

I guess this setup (constructing very long strings does not suit inject)

Robert

···

On 8/26/06, Robert Dober <robert.dober@gmail.com> wrote:

Sorry for replying 2 myself

--
Deux choses sont infinies : l'univers et la bêtise humaine ; en ce qui
concerne l'univers, je n'en ai pas acquis la certitude absolue.

- Albert Einstein

thanks, i did not know that one can count objects using ObjectSpace.

it looks like map+join use fewer objects than inject. even though map
creates a whole new array. i would expect inject to use fewer objects.
strange...

konstantin

Erik Veenstra wrote:

···

----------------------------------------------------------------

$ vi test.rb ; cat test.rb
GC.disable

1000.times do
   (1..1000).map{|s| s.to_s}.join("")
   #(1..1000).inject(""){|r, s| s.to_s; r}
end

count = 0
ObjectSpace.each_object do
   count += 1
end
p count

$ ruby test.rb
1003392

----------------------------------------------------------------

$ vi test.rb ; cat test.rb
GC.disable

1000.times do
   #(1..1000).map{|s| s.to_s}.join("")
   (1..1000).inject(""){|r, s| s.to_s; r}
end

count = 0
ObjectSpace.each_object do
   count += 1
end
p count

$ ruby test.rb
2001392

----------------------------------------------------------------

gegroet,
Erik V. - http://www.erikveen.dds.nl/

ako... wrote:

thanks, i did not know that one can count objects using ObjectSpace.

it looks like map+join use fewer objects than inject. even though map
creates a whole new array. i would expect inject to use fewer objects.
strange...

konstantin

Erik Veenstra wrote:
> ----------------------------------------------------------------
>
> $ vi test.rb ; cat test.rb
> GC.disable
>
> 1000.times do
> (1..1000).map{|s| s.to_s}.join("")
> #(1..1000).inject(""){|r, s| s.to_s; r}
> end
>
> count = 0
> ObjectSpace.each_object do
> count += 1
> end
> p count
>
> $ ruby test.rb
> 1003392
>
> ----------------------------------------------------------------
>
> $ vi test.rb ; cat test.rb
> GC.disable
>
> 1000.times do
> #(1..1000).map{|s| s.to_s}.join("")
> (1..1000).inject(""){|r, s| s.to_s; r}
> end
>
> count = 0
> ObjectSpace.each_object do
> count += 1
> end
> p count
>
> $ ruby test.rb
> 2001392
>
> ----------------------------------------------------------------
>
> gegroet,
> Erik V. - http://www.erikveen.dds.nl/

It's true that map appears to create fewer objects, which may have some
advantages, but the situation isn't entirely one-sided. Now consider
the case where the result of each function consumes a whole megabyte of
memory. Using map and join, you get 1000 megabyte-long strings in an
array, so none of them can be freed. You're already a gig into your
memory before you join them, resulting in another gig used.

If, as in the example, you have garbage collection turned off, you'll
be 2 gigs (plus overhead) into your memory+swap in either scenario.
However, with the garbage collector enabled and using inject, the
results of each iteration could be freed, keeping your maximum memory
demand lower.

Of course, the complete answer would have to factor in the
implementations of the string concatenations and of the join method,
the number of memory reallocations needed by each, and the degree of
heap fragmentation incurred.

Personally, I would use some kind of memory or file stream with
inject--so that not only can the intermediate result objects be freed
but also so memory reallocations can be minimized.

In the end, Erik's approach, using metrics rather than speculation,
will be most effective.

Brent Rowland

thanks, i did not know that one can count objects using ObjectSpace.

No ObjectSpace counts them for you
ObjectSpace.each_object{}

it looks like map+join use fewer objects than inject. even though map

creates a whole new array. i would expect inject to use fewer objects.
strange...

No I do not think so, I believe that "+" uses the objects.
What good does it do disable GC and count objects afterwards?

This is very tricky to test, and I completely screwed up the first time.
Anway avoid "+" in inject use "<<" if memory is a concern.

I still think that maybe a simple each with some programming around is the
best approach
in our case
s=""
each |e| s << e
s

or

injcect "" |s,e| s << e

map/join is more elegant but if memory is a concern forget it.

konstantin

<SNIP>

Cheers
Robert

···

On 8/27/06, ako... <akonsu@gmail.com> wrote:

it looks like map+join use fewer objects than inject. even though map
creates a whole new array. i would expect inject to use fewer objects.
strange...

It has to do with the block you gave inject() and the definition of String.+(). Watch:

>> ("a".."j").inject(String.new) do |res, let|
?> new_str = res + let
>> puts "#{new_str.object_id}: #{new_str.inspect}"
>> new_str
>> end
1669064: "a"
1668834: "ab"
1668624: "abc"
1668424: "abcd"
1668304: "abcde"
1668144: "abcdef"
1668064: "abcdefg"
1667854: "abcdefgh"
1667764: "abcdefghi"
1667534: "abcdefghij"
=> "abcdefghij"

Each of those intermediate Strings is a new object, which is right. That's what String.+() does:

$ ri -T String#+
--------------------------------------------------------------- String#+
      str + other_str => new_str

···

On Aug 26, 2006, at 10:05 PM, ako... wrote:
------------------------------------------------------------------------
      Concatenation---Returns a new String containing other_str
      concatenated to str.

         "Hello from " + self.to_s #=> "Hello from main"

The keywords there being "new String." String does have an append method though:

$ ri -T 'String#<<'
-------------------------------------------------------------- String#<<
      str << fixnum => str
      str.concat(fixnum) => str
      str << obj => str
      str.concat(obj) => str
------------------------------------------------------------------------
      Append---Concatenates the given object to str. If the object is a
      Fixnum between 0 and 255, it is converted to a character before
      concatenation.

         a = "hello "
         a << "world" #=> "hello world"
         a.concat(33) #=> "hello world!"

And if we use that, we can get down to less objects than map(), because we don't need the Array:

>> ("a".."j").inject(String.new) do |res, let|
?> same_str = res << let
>> puts "#{same_str.object_id}: #{same_str.inspect}"
>> same_str
>> end
938158: "a"
938158: "ab"
938158: "abc"
938158: "abcd"
938158: "abcde"
938158: "abcdef"
938158: "abcdefg"
938158: "abcdefgh"
938158: "abcdefghi"
938158: "abcdefghij"
=> "abcdefghij"

Hope that helps.

James Edward Gray II