"map" a deeply nested structure: Object#deep_map

Hi,

most of you probably know/use Ruby Facets

http://rubyworks.github.com/facets/

I've recently posted a question on Facets discussion group:

http://groups.google.com/group/facets-universal/browse_thread/thread/683215a35f06af36

and I also tried my implementation (which apparently works fine):

http://snurl.com/19n6qh

    class Object

      def deep_map(&block)
        if self.respond_to? :each_pair
          out = {}
          self.each_pair do |k, v|
            out[k] = v.deep_map(&block)
          end
          return out
        elsif self.respond_to? :each
          out = []
          self.each do |x|
            out << x.deep_map(&block)
          end
          return out
        else
          return block.call(self)
        end
      end

    end

Is there room for improvements? Facets author suggested to start a
discussion here too.

So: what is, in your opinion, the best way to map a deeply nested
structure made up of Arrays, Hashes, Array of Hashes etc. etc. ?

Thanks,
Guido

···

--
Posted via http://www.ruby-forum.com/.

Hi,

most of you probably know/use Ruby Facets

http://rubyworks.github.com/facets/

I've recently posted a question on Facets discussion group:

http://groups.google.com/group/facets-universal/browse_thread/thread/683215a35f06af36

and I also tried my implementation (which apparently works fine):

http://snurl.com/19n6qh

class Object

 def deep\_map\(&amp;block\)
   if self\.respond\_to? :each\_pair
     out = \{\}
     self\.each\_pair do |k, v|
       out\[k\] = v\.deep\_map\(&amp;block\)

What about keys? Hash#map allows to map keys and values

irb(main):001:0> h={1=>2,3=>4}
=> {1=>2, 3=>4}
irb(main):002:0> h.map {|k,v| [k*10,v*100]}
=> [[10, 200], [30, 400]]

What, if the key is an Array or Hash? Wouldn't this be more appropriate?

           out[k.deep_map(&block)] = v.deep_map(&block)

OTOH then you cannot map key and value together.

     end
     return out
   elsif self\.respond\_to? :each
     out = \[\]
     self\.each do |x|
       out &lt;&lt; x\.deep\_map\(&amp;block\)
     end
     return out
   else
     return block\.call\(self\)
   end
 end

end

Is there room for improvements? Facets author suggested to start a
discussion here too.

You could remove lines "out = ..." and replace them by

out = self.class.new

So: what is, in your opinion, the best way to map a deeply nested
structure made up of Arrays, Hashes, Array of Hashes etc. etc. ?

I wonder how many use cases for this there are actually. In your
example you can uniformly treat each object since you want the
object_id. That would work for a few other methods as well (e.g.
#inspect, #to_s). In other cases you would have to discriminate
treatment of leave values in your block, e.g.

x.deep_map |y|
  case y
  when String
    ...
  when Fixnum
   ...
  else
   ...
end

That list could become lengthy. Basically you implemented tree
traversal with double dispatch - only that the double dispatch must be
done manually in the block. :slight_smile:

Kind regards

robert

···

On Wed, Oct 6, 2010 at 4:43 PM, Guido De Rosa <guidoderosa@gmail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

What about keys? Hash#map allows to map keys and values

irb(main):001:0> h={1=>2,3=>4}
=> {1=>2, 3=>4}
irb(main):002:0> h.map {|k,v| [k*10,v*100]}
=> [[10, 200], [30, 400]]

What, if the key is an Array or Hash? Wouldn't this be more
appropriate?

           out[k.deep_map(&block)] = v.deep_map(&block)

I was not interested in mapping keys, but this would be a reasonable
extra feature.

So deep_map would be used like that:

    object.deep_map{|k, v| ... [result_key, result_value]}

resembling Hash#map behavior

and when object is not Hash-like, result_key would be simply ignored.

It should not be hard to allow client code to use also "the previous
API":

    object.deep_map{|x| ... }

by checking block.arity in the implementation code.

You could remove lines "out = ..." and replace them by

out = self.class.new

Good point... apparently!

What if self is a Range?

Forcing to return an Array (or a Hash) is not so bad: even the standard
method Enumerable#map does so! And so does Enumerable#sort and many
others: it's just to avoid nonsensical situations :slight_smile:

I wonder how many use cases for this there are actually. In your
example you can uniformly treat each object since you want the
object_id. That would work for a few other methods as well (e.g.
#inspect, #to_s). In other cases you would have to discriminate
treatment of leave values in your block, e.g.

x.deep_map |y|
  case y
  when String
    ...
  when Fixnum
   ...
  else
   ...
end

That list could become lengthy. Basically you implemented tree
traversal with double dispatch - only that the double dispatch must be
done manually in the block. :slight_smile:

The same holds for standard Enumerable#map and Hash#map, so...

I just want to write the recursive version of some Ruby core methods,
resembling by many aspects pretty much the same behavior, including the
fact that some checks are up to the user :wink:

G.

···

--
Posted via http://www.ruby-forum.com/\.

What about keys? Hash#map allows to map keys and values

irb(main):001:0> h={1=>2,3=>4}
=> {1=>2, 3=>4}
irb(main):002:0> h.map {|k,v| [k*10,v*100]}
=> [[10, 200], [30, 400]]

What, if the key is an Array or Hash? Wouldn't this be more
appropriate?

       out\[k\.deep\_map\(&amp;block\)\] = v\.deep\_map\(&amp;block\)

I was not interested in mapping keys, but this would be a reasonable
extra feature.

OTOH you say you want to mimic #map behavior and since this is part of
the standard behavior people might expect to be able to map keys as
well if this goes into a library.

So deep_map would be used like that:

object.deep_map{|k, v| ... [result_key, result_value]}

resembling Hash#map behavior

and when object is not Hash-like, result_key would be simply ignored.

It should not be hard to allow client code to use also "the previous
API":

object.deep_map{|x| ... }

by checking block.arity in the implementation code.

I think that's not necessary. You can just pass an Array to get the
same behavior as Hash#each:

irb(main):002:0> def f1; yield [1,2] end
=> nil
irb(main):003:0> f1 {|a| p a}
[1, 2]
=> [1, 2]
irb(main):004:0> f1 {|a,b| p a}
1
=> 1
irb(main):005:0> {1=>2}.each {|a,b| p a}
1
=> {1=>2}
irb(main):006:0> {1=>2}.each {|a| p a}
[1, 2]
=> {1=>2}
irb(main):007:0>

You could remove lines "out = ..." and replace them by

out = self.class.new

Good point... apparently!

What if self is a Range?

Good point.

Forcing to return an Array (or a Hash) is not so bad: even the standard
method Enumerable#map does so! And so does Enumerable#sort and many
others: it's just to avoid nonsensical situations :slight_smile:

But according to that logic *all* collections returned should be
Arrays. So out = {} would become out = .

I wonder how many use cases for this there are actually. In your
example you can uniformly treat each object since you want the
object_id. That would work for a few other methods as well (e.g.
#inspect, #to_s). In other cases you would have to discriminate
treatment of leave values in your block, e.g.

x.deep_map |y|
case y
when String
...
when Fixnum
...
else
...
end

That list could become lengthy. Basically you implemented tree
traversal with double dispatch - only that the double dispatch must be
done manually in the block. :slight_smile:

The same holds for standard Enumerable#map and Hash#map, so...

I just want to write the recursive version of some Ruby core methods,
resembling by many aspects pretty much the same behavior, including the
fact that some checks are up to the user :wink:

The difference is that in a Hash or Array values are usually uniform
while with a recursive structure it is much more likely that they are
not. So in the case of Enumerable#map you typically know what objects
you map while in the recursive case you rather need checks.

I still wonder about the usability of this.

Kind regards

robert

···

On Wed, Oct 6, 2010 at 6:05 PM, Guido De Rosa <guidoderosa@gmail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme wrote:

OTOH you say you want to mimic #map behavior and since this is part of
the standard behavior people might expect to be able to map keys as
well if this goes into a library.

Right.

� �object.deep_map{|x| ... }

by checking block.arity in the implementation code.

I think that's not necessary. You can just pass an Array to get the
same behavior as Hash#each:

irb(main):002:0> def f1; yield [1,2] end
=> nil
irb(main):003:0> f1 {|a| p a}
[1, 2]
=> [1, 2]
irb(main):004:0> f1 {|a,b| p a}
1
=> 1
irb(main):005:0> {1=>2}.each {|a,b| p a}
1
=> {1=>2}
irb(main):006:0> {1=>2}.each {|a| p a}
[1, 2]
=> {1=>2}
irb(main):007:0>

Ok. Thanks.

But according to that logic *all* collections returned should be
Arrays. So out = {} would become out = .

Enumerable#map returns an Array on any object except wen the object is a
Hash: in such case it returns another Hash. And so should do #deep_map.

The difference is that in a Hash or Array values are usually uniform
while with a recursive structure it is much more likely that they are
not. So in the case of Enumerable#map you typically know what objects
you map while in the recursive case you rather need checks.

I still wonder about the usability of this.

Of course it depends by the usage scenario, and I actually found such a
need in one of my production projects. I may tell you longer detail if
you wish...

Of course there's no need that the values belong to the same class or to
derived class, they just need to be uniform in the duck-typing sense.

Consider something like:

    my_deep_structure.deep_map{|v| v.to_s}

any of this value may belong to a *different* class but still in
Your::Application::NameSpace::

This doesn't appear so uncommon or unrealistic.

You may have designed your application so that any of your classes
respond to to_s or some other method which "exports" data.

The result of deep_map will not contain any reference to your
application logic and may easily "inter-operate" with others: I fond
this more convenient that overwriting to_json, to_yaml, marshal_dump and
so on...

G.

···

--
Posted via http://www.ruby-forum.com/\.

Guido De Rosa wrote:

Robert Klemme wrote:

But according to that logic *all* collections returned should be
Arrays. So out = {} would become out = .

Enumerable#map returns an Array on any object except wen the object is a
Hash: in such case it returns another Hash. And so should do #deep_map.

Ouch! It actually returns an Array of key-value pairs. Sorry for the
mistake. I will consider you point, then :wink: .

G.

···

--
Posted via http://www.ruby-forum.com/\.

11:04:25 ~$ allruby -e 'p({1=>2}.map {|a,b| [b,a]}.class)'
CYGWIN_NT-5.1 padrklemme2 1.7.7(0.230/5/3) 2010-08-31 09:58 i686 Cygwin

···

On Thu, Oct 7, 2010 at 10:22 AM, Guido De Rosa <guidoderosa@gmail.com> wrote:

Robert Klemme wrote:

But according to that logic *all* collections returned should be
Arrays. So out = {} would become out = .

Enumerable#map returns an Array on any object except wen the object is a
Hash: in such case it returns another Hash. And so should do #deep_map.

========================================
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
Array

ruby 1.9.1p429 (2010-07-02 revision 28523) [i386-cygwin]
Array

jruby 1.4.0 (ruby 1.8.7 patchlevel 174) (2009-11-02 69fbfa3) (Java
HotSpot(TM) Client VM 1.6.0_21) [x86-java]
Array

I see you noticed already.

The difference is that in a Hash or Array values are usually uniform
while with a recursive structure it is much more likely that they are
not. So in the case of Enumerable#map you typically know what objects
you map while in the recursive case you rather need checks.

I still wonder about the usability of this.

Of course it depends by the usage scenario, and I actually found such a
need in one of my production projects. I may tell you longer detail if
you wish...

Yes, please.

Of course there's no need that the values belong to the same class or to
derived class, they just need to be uniform in the duck-typing sense.

Consider something like:

my_deep_structure.deep_map{|v| v.to_s}

What do you need that for? If it is for printing or logging then you
can use my_deep_structure.to_s or my_deep_structure.inspect.

any of this value may belong to a *different* class but still in
Your::Application::NameSpace::

This doesn't appear so uncommon or unrealistic.

You may have designed your application so that any of your classes
respond to to_s or some other method which "exports" data.

The result of deep_map will not contain any reference to your
application logic and may easily "inter-operate" with others: I fond
this more convenient that overwriting to_json, to_yaml, marshal_dump and
so on...

You really make me curious about your use case.

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme wrote:

Guido De Rosa wrote:

Of course it depends by the usage scenario, and I actually found such a
need in one of my production projects. I may tell you longer detail if
you wish...

Yes, please.

Of course there's no need that the values belong to the same class or to
derived class, they just need to be uniform in the duck-typing sense.

Consider something like:

   my_deep_structure.deep_map{|v| v.to_s}

What do you need that for? If it is for printing or logging then you
can use my_deep_structure.to_s or my_deep_structure.inspect.

this more convenient that overwriting to_json, to_yaml, marshal_dump and
so on...

You really make me curious about your use case.

This is embarrassing but I'm thinking - right now - that I made some
other mistakes in "my use case". Thanks for the head-ups, this
discussion has proven useful.

What I will do, if you're still curious, is just provide a link to my
fixes (as git commits, when they are done).

Now, don't mind my use case. Let's return to my first post.

My conclusion is that the method I wrote should be renamed
deep_map_values. And another method may be called deep_rekey with
obvious behavior.

Hash#map is totally another beast:

  1. it mixes key and values in the transformation (which may lead to
     unwanted results in a hypothetical recursive version IMHO)

  2. it turns an Hash into an Array of pairs (which may be what you
     want, or not)

Regards,
Guido

···

--
Posted via http://www.ruby-forum.com/\.

Perhaps what is being sought here is a form of search and replace.

  class Array
    def deep_replace(replacement, &match)
      map do |e|
        if e.respond_to?(:deep_replace)
          e.deep_replace(replacement, &match)
        else
          match[e] ? replacement : e
        end
    end
  end

  require 'facets/hash/mash' #[1]

  class Hash
    def deep_replace(replacement, &match)
      mash do |k,v|
        nk = if k.respond_to?(:deep_replace)
          k.deep_replace(replacement, &match)
        else
          match[k] ? replacement : k
        end
        nv = if v.respond_to?(:deep_replace)
          v.deep_replace(replacement, &match)
        else
          match[v] ? replacement : v
        end
        [nk, nv]
    end
  end

Not tested but you get the idea.

I'm not 100% sure about handling the key for Hash either, but perhaps
an option could toggle that usage on or off.

[1]http://blog.thepete.net/2010/02/ruby-facets-mash-method.html

Thomas Sawyer wrote:

Perhaps what is being sought here is a form of search and replace.

  class Array
    def deep_replace(replacement, &match)
      map do |e|
        if e.respond_to?(:deep_replace)
          e.deep_replace(replacement, &match)
        else
          match[e] ? replacement : e
        end
    end
  end

  require 'facets/hash/mash' #[1]

  class Hash
    def deep_replace(replacement, &match)
      mash do |k,v|
        nk = if k.respond_to?(:deep_replace)
          k.deep_replace(replacement, &match)
        else
          match[k] ? replacement : k
        end
        nv = if v.respond_to?(:deep_replace)
          v.deep_replace(replacement, &match)
        else
          match[v] ? replacement : v
        end
        [nk, nv]
    end
  end

Not tested but you get the idea.

I'm not 100% sure about handling the key for Hash either, but perhaps
an option could toggle that usage on or off.

[1]http://blog.thepete.net/2010/02/ruby-facets-mash-method.html

That way you can say for example:

    Here's a deeply nested object; replace any "content" n which is a
    Fixnum with the number 123

But you cannot say:

    Here's a deeply nested object; replace any "content" n which is a
    Fixnum with the number n + 1

#deep_replace would be a lot more flexible if replacement could be a
lambda.

The drawback is that client code would look ugly:

    o.deep_replace(lambda {|x| x+1}) {|y| ...}

Is there an elegant way o pass *two* blocks of code?

Maybe a more consistent API would look like:

    o.deep_replace(
      lambda {|x| ... }, # matching conditions
      lambda {|x| ... } # replacement code
    )

or like:

    o.deep_replace(
      lambda {|k, v| ... }, # matching conditions
      lambda {|k, v| ... [new_k, new_v] } # replacement code
    )

But more simply the "matching" may be done by the client code:

   o.deep_replace do |k, v|
     if matching_conditions(k, v)
       # do the replecement
       ...
       [new_k, new_v]
     else
       [k, v] # leave the same
     end
   end

so, again, we have to pass one block of code, but for the replacement,
not the match.

G.

···

--
Posted via http://www.ruby-forum.com/\.

That way you can say for example:

Here&#39;s a deeply nested object; replace any &quot;content&quot; n which is a
Fixnum with the number 123

But you cannot say:

Here&#39;s a deeply nested object; replace any &quot;content&quot; n which is a
Fixnum with the number n \+ 1

#deep_replace would be a lot more flexible if replacement could be a
lambda.

The drawback is that client code would look ugly:

o\.deep\_replace\(lambda \{|x| x\+1\}\) \{|y| \.\.\.\}

Is there an elegant way o pass *two* blocks of code?

Indeed there is. In fact that is something akin to how the new Facets
#recursively method works.

  arr = ["a", ["b", "c"]]
  arr.recursively{ |a| a.reverse }.map{ |v| v.to_sym }
  #=> [:a, [:c, :b]]

The first block handles enumerables and the second handles non-
enumerable elements.

(See http://github.com/rubyworks/facets/blob/master/lib/core/facets/array/recursively.rb\)

But I am not sure #recursively can help in the search-and-replace
case.

This leads me to wonder about a more general API, e.g.

  def match(&match)
    Enumerable::Matcher.new(&match)
  end

And then Enumerable::Matcher could have different methods for what to
do with the various matches, such as #replace or #delete.

Maybe a more consistent API would look like:

o\.deep\_replace\(
  lambda \{|x| \.\.\. \}, \# matching conditions
  lambda \{|x| \.\.\. \}  \# replacement code
\)

or like:

o\.deep\_replace\(
  lambda \{|k, v| \.\.\.                \}, \# matching conditions
  lambda \{|k, v| \.\.\. \[new\_k, new\_v\] \}  \# replacement code
\)

But more simply the "matching" may be done by the client code:

o.deep_replace do |k, v|
if matching_conditions(k, v)
# do the replecement
...
[new_k, new_v]
else
[k, v] # leave the same
end
end

so, again, we have to pass one block of code, but for the replacement,
not the match.

But we still have the problem of matching |k,v| for Hashes but |e| for
Arrays.

···

On Oct 7, 1:18 pm, Guido De Rosa <guidoder...@gmail.com> wrote: