Set intersection with a specific criteria: a better idiom?

Hi,

I have been trying to intersect 2 sets of objects that where I want the "intersected" objects to share a common property.

This is what I came up with:

Here, I want to get all the Errors having the same file name:

def intersection_set
  Set.new(event_errors).&(Set.new(socket_channel_errors)) {|x| file_name(x)}
end

To allow that I extended the Set class with:

def &(enum, &property)
    n = self.class.new
    if block_given?
      mapped = self.map{|e| property.call(e)}
      enum.each { |o| n.add(o) if mapped.include?(property.call(o)) }
    else
      enum.each { |o| n.add(o) if include?(o) }
    end
    n
end

I have met this requirement quite some times while scripting and I may miss something here.
Do you know a more idiomatic, Ruby-way of doing that?

Thanks,

Eric.

···

---------------------------------------------------------------------------
Eric TORREBORRE
tel: +81 (0)90 5580 3280
e-mail: etorreborre@yahoo.com / etorreborre@docomo.ne.jp
blog: http://etorreborre.blogspot.com
---------------------------------------------------------------------------

      _____________________________________________________________________________
Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail

Eric Torreborre wrote:

Hi,

I have been trying to intersect 2 sets of objects that where I want the "intersected" objects to share a common property.

This is what I came up with:

Here, I want to get all the Errors having the same file name:

def intersection_set
  Set.new(event_errors).&(Set.new(socket_channel_errors)) {|x| file_name(x)}
end

To allow that I extended the Set class with:

def &(enum, &property)
    n = self.class.new
    if block_given?
      mapped = self.map{|e| property.call(e)}
      enum.each { |o| n.add(o) if mapped.include?(property.call(o)) }
    else
      enum.each { |o| n.add(o) if include?(o) }
    end
    n
end
I have met this requirement quite some times while scripting and I may miss something here.
Do you know a more idiomatic, Ruby-way of doing that?

Hm, it's like Schwartzian transform: intersect_by instead of sort_by.

Your implementation looks good, but might be more efficient with:

       mapped = self.class.new(map{|e| property.call(e)})

which would make mapped a set, for faster lookups.

Also, using yield will probably be more efficient than instantiating a proc and calling it:

   def &(enum)
     n = self.class.new
     if block_given?
       mapped = self.class.new(map{|e| yield e})
       enum.each { |o| n.add(o) if mapped.include?(yield o) }
     else
       enum.each { |o| n.add(o) if include?(o) }
     end
     n
   end

It would be more elegant to use select than to iterate through enum using #each and #add to the set, but Set#select returns an Array. At the cost of using two iterations instead of once, you could do this:

require 'set'

class Set
   def intersect_by(other)
     props = self.class.new(other.map {|t| yield t})
     self.class.new(select {|t| props.include? t.prop})
   end

# Actually the following is closer to the origial:
# def intersect_by(other)
# props = self.class.new(map {|t| yield t})
# self.class.new(other.select {|t| props.include? t.prop})
# end

   alias old_intersect &

   def &(other)
     if block_given?
       intersect_by(other) {|t| yield t}
     else
       old_intersect(other) {|t| yield t}
     end
   end
end

class Test
   attr_accessor :prop
   def initialize prop
     @prop = prop
   end
end

s1 = Set.new([
   Test.new(1),
   Test.new(2),
   Test.new(3)
])

s2 = Set.new([
   Test.new(2),
   Test.new(3),
   Test.new(4)
])

p(s1.intersect_by(s2) {|t| t.prop})
p(Set[1,2,3]&Set[2,3,4])

__END__

#<Set: {#<Test:0xb79f73e4 @prop=2>, #<Test:0xb79f73d0 @prop=3>}>
#<Set: {2, 3}>

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Eric Torreborre wrote:

Hi,

I have been trying to intersect 2 sets of objects that where I want the "intersected" objects to share a common property.

This is what I came up with:

Here, I want to get all the Errors having the same file name:

def intersection_set
  Set.new(event_errors).&(Set.new(socket_channel_errors)) {|x| file_name(x)}
end

To allow that I extended the Set class with:

def &(enum, &property)
    n = self.class.new
    if block_given?
      mapped = self.map{|e| property.call(e)}
      enum.each { |o| n.add(o) if mapped.include?(property.call(o)) }
    else
      enum.each { |o| n.add(o) if include?(o) }
    end
    n
end
I have met this requirement quite some times while scripting and I may miss something here.
Do you know a more idiomatic, Ruby-way of doing that?

Hm, it's like Schwartzian transform: intersect_by instead of sort_by.

Your implementation looks good, but might be more efficient with:

       mapped = self.class.new(map{|e| property.call(e)})

which would make mapped a set, for faster lookups.

Also, using yield will probably be more efficient than instantiating a proc and calling it:

   def &(enum)
     n = self.class.new
     if block_given?
       mapped = self.class.new(map{|e| yield e})
       enum.each { |o| n.add(o) if mapped.include?(yield o) }
     else
       enum.each { |o| n.add(o) if include?(o) }
     end
     n
   end

It would be more elegant to use select than to iterate through enum using #each and #add to the set, but Set#select returns an Array. At the cost of using two iterations instead of once, you could do this:

require 'set'

class Set
   def intersect_by(other)
     props = self.class.new(other.map {|t| yield t})
     self.class.new(select {|t| props.include? t.prop})
   end

# Actually the following is closer to the origial:
# def intersect_by(other)
# props = self.class.new(map {|t| yield t})
# self.class.new(other.select {|t| props.include? t.prop})
# end

   alias old_intersect &

   def &(other)
     if block_given?
       intersect_by(other) {|t| yield t}
     else
       old_intersect(other) {|t| yield t}
     end
   end
end

class Test
   attr_accessor :prop
   def initialize prop
     @prop = prop
   end
end

s1 = Set.new([
   Test.new(1),
   Test.new(2),
   Test.new(3)
])

s2 = Set.new([
   Test.new(2),
   Test.new(3),
   Test.new(4)
])

p(s1.intersect_by(s2) {|t| t.prop})
p(Set[1,2,3]&Set[2,3,4])

__END__

Output:

#<Set: {#<Test:0xb79f73e4 @prop=2>, #<Test:0xb79f73d0 @prop=3>}>
#<Set: {2, 3}>

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Basically you want to do the intersection based on a specific criterion. My first choice in such a scenario would be a Hash based solution, i.e. you create an index based on what you consider the key field. See attachment for a sample implementation.

Regarding your question whether to include your code in std lib or facets lib: I vote "no". Here's why: a set intersection will leave you with a single uniform set of items, namely all items present in all sets that take part in the intersection. Your solution is not directly an intersection but it creates a copy of one set with all elements from the copy removed that do not fit a certain criterion. More specifically, you kind of reversed the logic because the resulting set has only elements from the /parameter/ (i.e. the right hand side of the operator) and not from the left hand side. So while for intersection order of arguments does not matter, it does for substraction:

irb(main):001:0> require 'set'
=> true
irb(main):002:0> s1 = (1..10).to_set
=> #<Set: {5, 6, 1, 7, 2, 8, 3, 9, 4, 10}>
irb(main):003:0> s2 = (5..15).to_set
=> #<Set: {5, 11, 6, 12, 7, 13, 8, 14, 9, 15, 10}>
irb(main):004:0> s1 - s2
=> #<Set: {1, 2, 3, 4}>
irb(main):005:0> s2 - s1
=> #<Set: {11, 12, 13, 14, 15}>

I would rewrite your original code like this:

require 'set'
...
# use a set for efficiency reasons
keys = event_errors.map {|x| file_name x}.to_set
# or, more efficient:
keys = event_errors.inject(Set.new) {|s,x| s << file_name(x)}
# create the result
result = socket_channel_errors.select {|x| keys.include? file_name(x)}

Since this is basically a two liner I don't see any necessity to change std libs.

Kind regards

  robert

···

On 31.05.2007 19:21, Joel VanderWerf wrote:

Eric Torreborre wrote:

Hi,

I have been trying to intersect 2 sets of objects that where I want the "intersected" objects to share a common property.

This is what I came up with:

Here, I want to get all the Errors having the same file name:

def intersection_set
  Set.new(event_errors).&(Set.new(socket_channel_errors)) {|x| file_name(x)}
end

To allow that I extended the Set class with:

def &(enum, &property)
    n = self.class.new
    if block_given?
      mapped = self.map{|e| property.call(e)}
      enum.each { |o| n.add(o) if mapped.include?(property.call(o)) }
    else
      enum.each { |o| n.add(o) if include?(o) }
    end
    n
end
I have met this requirement quite some times while scripting and I may miss something here.
Do you know a more idiomatic, Ruby-way of doing that?