Removing Duplicate Objects from Object List

Greetings all.

Does anyone have a good idea of how to write a loop that checks if two
objects are equal? By "equal" here I refer to the 'eql' method, to test if
the objects have the same value.

I have set of Rule objects that will be stored in a RuleList object. I know
how to cycle through the RuleList. I'm just doing this:

$ruleList.selection.each { |rule|
  ...
}

The problem is that I need to go through each rule and check if it is equal
to *any* of the other rules that are in the list. If a duplicate is found,
one of the duplicate rules should be removed.

Every solution I've tried has ended up either removing objects incorrectly
or not finding the duplicates in the first place.

Here's an example of some Rule objects:

<Tendent::Rule:0x2d5f7a8 @filter="first after 202G_OrdAdd",
@value="203G_OrdUpdateFirst", @point="203G_OrdUpdateFirst">
<Tendent::Rule:0x2d5f71c @filter="last", @value="203G_OrdUpdateLast",
@point="203G_OrdUpdateLast">
<Tendent::Rule:0x2d5f6a4 @filter="first after 202G_OrdAdd",
@value="203G_OrdUpdateFirst", @point="203G_OrdUpdateFirst">
<Tendent::Rule:0x2d5f62c @filter="last", @value="203G_OrdUpdateLast",
@point="203G_OrdUpdateLast">

Here is what they look like as strings:

203G_OrdUpdateFirst, first after 202G_OrdAdd, 203G_OrdUpdateFirst
203G_OrdUpdateLast, last, 203G_OrdUpdateLast
203G_OrdUpdateFirst, first after 202G_OrdAdd, 203G_OrdUpdateFirst
203G_OrdUpdateLast, last, 203G_OrdUpdateLast

So as you can see, I have four rules, but actually only two are unique.
(That just happens to be the case here. In other cases, perhaps there will
be six rules, and two will be unique.)

Can anyone see an efficient way to do this?

Is it better to just convert these into an array? I know the Array class has
the 'uniq' method. The problem is that I would still need the rules to then
be objects as well. In other words, even if I put all the objects in an
array and modify the array, I would need to reflect the changes in the
object list itself, such that the duplicate objects no longer exist.

- Jeff

Jeff,

How are you storing the Rules in your RuleSet at the moment? Personally
I'd use an Array (or simply subclass Array) and then you get to use
Array.uniq without shifting objects back and forth.

Stephen

···

On Oct 9, 12:16 pm, "Jeff Nyman" <jeffnyman_nospam@nospam_gmail.com> wrote:

Greetings all.

Does anyone have a good idea of how to write a loop that checks if two
objects are equal? By "equal" here I refer to the 'eql' method, to test if
the objects have the same value.

I have set of Rule objects that will be stored in a RuleList object. I know
how to cycle through the RuleList. I'm just doing this:

$ruleList.selection.each { |rule|
  ...

}The problem is that I need to go through each rule and check if it is equal
to *any* of the other rules that are in the list. If a duplicate is found,
one of the duplicate rules should be removed.

Every solution I've tried has ended up either removing objects incorrectly
or not finding the duplicates in the first place.

Here's an example of some Rule objects:

<Tendent::Rule:0x2d5f7a8 @filter="first after 202G_OrdAdd",
@value="203G_OrdUpdateFirst", @point="203G_OrdUpdateFirst">
<Tendent::Rule:0x2d5f71c @filter="last", @value="203G_OrdUpdateLast",
@point="203G_OrdUpdateLast">
<Tendent::Rule:0x2d5f6a4 @filter="first after 202G_OrdAdd",
@value="203G_OrdUpdateFirst", @point="203G_OrdUpdateFirst">
<Tendent::Rule:0x2d5f62c @filter="last", @value="203G_OrdUpdateLast",
@point="203G_OrdUpdateLast">

Here is what they look like as strings:

203G_OrdUpdateFirst, first after 202G_OrdAdd, 203G_OrdUpdateFirst
203G_OrdUpdateLast, last, 203G_OrdUpdateLast
203G_OrdUpdateFirst, first after 202G_OrdAdd, 203G_OrdUpdateFirst
203G_OrdUpdateLast, last, 203G_OrdUpdateLast

So as you can see, I have four rules, but actually only two are unique.
(That just happens to be the case here. In other cases, perhaps there will
be six rules, and two will be unique.)

Can anyone see an efficient way to do this?

Is it better to just convert these into an array? I know the Array class has
the 'uniq' method. The problem is that I would still need the rules to then
be objects as well. In other words, even if I put all the objects in an
array and modify the array, I would need to reflect the changes in the
object list itself, such that the duplicate objects no longer exist.

- Jeff

Thank you to all of you!

With everything said here, I definitely have this working now. Not only
that, but I learned a lot more about hash and Set.

(Just when you think you have a grasp of Ruby, you find you were only at the
tip of the iceberg ...)

- Jeff

"gaspode" <stephendl@gmail.com> wrote in message
news:1160393869.844030.8080@m7g2000cwm.googlegroups.com...

How are you storing the Rules in your RuleSet at the moment? Personally
I'd use an Array (or simply subclass Array) and then you get to use
Array.uniq without shifting objects back and forth.

Essentially, I have a RuleList class like this:

<code>
class RuleList
  def initialize
    @rules = Array.new
  end

  def append(this_rule)
    @rules.push(this_rule)
  end

  def selection
    @rules.find_all { |rule| rule }
  end
end
</code>

Then I have a Rule class like this:

<code>
class Rule
  attr_accessor :point, :filter, :value

  def initialize(point, filter, value)
    @point = point
    @filter = filter
    @value = value
  end

  def to_s
    "#@point, #@filter, #@value"
  end
end
</code>

When a rule object needs to be added to the list, I do this:

$ruleList.append(Rule.new(step.point2, rule, value))

Does that give enough detail?

In playing around a bit more, I tried this:

rules_array = $ruleList.selection.collect { |rule| rule }

Then I tried:

rules_array.uniq!

The problem is that this finds nothing as a duplicate. But that makes sense
(I think) because the object ID is probably being considered as part of the
test and those will, of course, not be duplicates.

It sounds like you're saying it would be better to not use a Rule class in
the first place. Is that accurate?

- Jeff

If instead of declaring your @rules as an array, you declare it as a set you
will get no duplicates for free (I think)

However, you need to incorporate the <=> operator in your Rule class to
tell ruby how your objects relate to each other.
ie are they <, >, or =

···

On 10/9/06, Jeff Nyman <jeffnyman_nospam@nospam_gmail.com> wrote:

"gaspode" <stephendl@gmail.com> wrote in message
news:1160393869.844030.8080@m7g2000cwm.googlegroups.com...

> How are you storing the Rules in your RuleSet at the moment? Personally
> I'd use an Array (or simply subclass Array) and then you get to use
> Array.uniq without shifting objects back and forth.

Essentially, I have a RuleList class like this:

<code>
class RuleList
  def initialize
    @rules = Array.new
  end

  def append(this_rule)
    @rules.push(this_rule)
  end

  def selection
    @rules.find_all { |rule| rule }
  end
end
</code>

Then I have a Rule class like this:

<code>
class Rule
  attr_accessor :point, :filter, :value

  def initialize(point, filter, value)
    @point = point
    @filter = filter
    @value = value
  end

  def to_s
    "#@point, #@filter, #@value"
  end
end
</code>

When a rule object needs to be added to the list, I do this:

$ruleList.append(Rule.new(step.point2, rule, value))

Does that give enough detail?

In playing around a bit more, I tried this:

rules_array = $ruleList.selection.collect { |rule| rule }

Then I tried:

rules_array.uniq!

The problem is that this finds nothing as a duplicate. But that makes
sense
(I think) because the object ID is probably being considered as part of
the
test and those will, of course, not be duplicates.

It sounds like you're saying it would be better to not use a Rule class in
the first place. Is that accurate?

- Jeff

You should implement #eql? and #hash methods on your class, and store all instances in a [Set](http://ruby-doc.org/stdlib/libdoc/set/rdoc/classes/Set.html\).

require "set"

class Rule
   attr_accessor :point, :filter, :value

   def initialize(point, filter, value)
     @point = point
     @filter = filter
     @value = value
   end

   def eql?(rule)
     rule.point.eql?(@point) &&
    rule.filter.eql?(@filter) &&
    rule.value.eql?(@value)
   end

   def hash
     @point.hash + @filter.hash + @value.hash
   end
end

rules_set = Set.new
rules_set << Rule.new(1, 1, 1)
rules_set << Rule.new(1, 1, 1) # duplicate rule
rules_set << Rule.new(1, 1, 2)
rules_set.size # => 2
rules_set # => #<Set: {#<Rule:0x89d78 @value=2, @filter=1, @point=1>, #<Rule:0x89db4 @value=1, @filter=1, @point=1>}>

-- Daniel

···

On Oct 9, 2006, at 1:50 PM, Jeff Nyman wrote:

The problem is that I need to go through each rule and check if it is equal
to *any* of the other rules that are in the list. If a duplicate is found,
one of the duplicate rules should be removed.

class Rule
  attr_accessor :point, :filter, :value

  def initialize(point, filter, value)
    @point = point
    @filter = filter
    @value = value
  end
end

When a rule object needs to be added to the list, I do this:

$ruleList.append(Rule.new(step.point2, rule, value))

Does that give enough detail?

In playing around a bit more, I tried this:

rules_array = $ruleList.selection.collect { |rule| rule }

Then I tried:

rules_array.uniq!

The problem is that this finds nothing as a duplicate. But that makes sense
(I think) because the object ID is probably being considered as part of the
test and those will, of course, not be duplicates.

It sounds like you're saying it would be better to not use a Rule class in
the first place. Is that accurate?

"gaspode" <stephe...@gmail.com> wrote in messagenews:1160393869.844030.8080@m7g2000cwm.googlegroups.com...

> How are you storing the Rules in your RuleSet at the moment? Personally
> I'd use an Array (or simply subclass Array) and then you get to use
> Array.uniq without shifting objects back and forth.

Essentially, I have a RuleList class like this:

<snip>

When a rule object needs to be added to the list, I do this:

$ruleList.append(Rule.new(step.point2, rule, value))

Does that give enough detail?

Plenty

In playing around a bit more, I tried this:

rules_array = $ruleList.selection.collect { |rule| rule }

Then I tried:

rules_array.uniq!

The problem is that this finds nothing as a duplicate. But that makes sense
(I think) because the object ID is probably being considered as part of the
test and those will, of course, not be duplicates.

The reason that it isn't working as you expect is that the uniq method
uses eql?, which in turn uses the hash method (I think, somebody
correct me if I'm full of it). If you implement the hash method (to
return the same value for identical Rules) in your Rule class, this
should work fine.

It sounds like you're saying it would be better to not use a Rule class in
the first place. Is that accurate?

No, your current Rule class is good. Just implement hash!

- Jeff

Rather than doing:

rules_array = $ruleList.selection.collect { |rule| rule }
rules_array.uniq!

you could add a uniq and uniq! method to your RuleList that just
delegates the work to the underlying Array

<code>
def uniq
    @rules.uniq
end

def uniq!
    @rules.uniq!
end
</code>

If it is the case that you NEVER want the same Rule in there twice,
just do the check in the append method (also after implementing the
hash method)

<code>
def append(this_rule)
    @rules.push(this_rule) unless @rules.include?(this_rule)
end
</code>

···

On Oct 9, 12:45 pm, "Jeff Nyman" <jeffnyman_nospam@nospam_gmail.com> wrote:

uniq is failing because, even though the attributes of each instance of
the rule is 'eq' to the other, the compared instances are different.

class Foo
def initialize(a,b)
  @a = a
  @b = b
end
end

x = [Foo.new(:a, :b), Foo.new(:c, :d), Foo.new(:a, :b)]
p x
p x.uniq

ruby tst.rb
[#<Foo:0x3b6128 @a=:a, @b=:b>, #<Foo:0x3b6114 @a=:c, @b=:d>,
#<Foo:0x3b6100 @a=:a, @b=:b>]
[#<Foo:0x3b6128 @a=:a, @b=:b>, #<Foo:0x3b6114 @a=:c, @b=:d>,
#<Foo:0x3b6100 @a=:a, @b=:b>]

I tried defining eq? and hash and uniq still fails. hash returns
identical values for objects with identical content and eq? returns
true in this case, but uniq does not remove them.

Probably the right thing to do is to write a couple of loops.

Jeff Nyman wrote:

···

"gaspode" <stephendl@gmail.com> wrote in message
news:1160393869.844030.8080@m7g2000cwm.googlegroups.com...

> How are you storing the Rules in your RuleSet at the moment? Personally
> I'd use an Array (or simply subclass Array) and then you get to use
> Array.uniq without shifting objects back and forth.

Essentially, I have a RuleList class like this:

<code>
class RuleList
  def initialize
    @rules = Array.new
  end

  def append(this_rule)
    @rules.push(this_rule)
  end

  def selection
    @rules.find_all { |rule| rule }
  end
end
</code>

Then I have a Rule class like this:

<code>
class Rule
  attr_accessor :point, :filter, :value

  def initialize(point, filter, value)
    @point = point
    @filter = filter
    @value = value
  end

  def to_s
    "#@point, #@filter, #@value"
  end
end
</code>

When a rule object needs to be added to the list, I do this:

$ruleList.append(Rule.new(step.point2, rule, value))

Does that give enough detail?

In playing around a bit more, I tried this:

rules_array = $ruleList.selection.collect { |rule| rule }

Then I tried:

rules_array.uniq!

The problem is that this finds nothing as a duplicate. But that makes sense
(I think) because the object ID is probably being considered as part of the
test and those will, of course, not be duplicates.

It sounds like you're saying it would be better to not use a Rule class in
the first place. Is that accurate?

- Jeff

Daniel N wrote:

If instead of declaring your @rules as an array, you declare it as a set
you
will get no duplicates for free (I think)

yes but you will loose order (if that is important)

However, you need to incorporate the <=> operator in your Rule class to
tell ruby how your objects relate to each other.
ie are they <, >, or =

That won't do the trick for uniq as uniq is using a hash internally do
find duplicates. You have to define #hash and #eql? for this to work.
(or was it #hash and #== ?)

cheers

Simon

harp:~ > cat a.rb
class Foo
   ATTRIBUTES = %w( a b )
   ATTRIBUTES.each{|at| attr at}

   def initialize(a,b) @a, @b = a, b end
   def parts() ATTRIBUTES.map{|at| send at} end
   def eql?(other) parts == other.parts end
   def hash() parts.hash end
end

p [ Foo.new(:a, :b), Foo.new(:c, :d), Foo.new(:a, :b) ]
p [ Foo.new(:a, :b), Foo.new(:c, :d), Foo.new(:a, :b) ].uniq

harp:~ > ruby a.rb
[#<Foo:0xb75d137c @a=:a, @b=:b>, #<Foo:0xb75d1368 @a=:c, @b=:d>, #<Foo:0xb75d1340 @a=:a, @b=:b>]
[#<Foo:0xb75d1214 @a=:a, @b=:b>, #<Foo:0xb75d1200 @a=:c, @b=:d>]

-a

···

On Wed, 11 Oct 2006, Mike wrote:

uniq is failing because, even though the attributes of each instance of
the rule is 'eq' to the other, the compared instances are different.

class Foo
def initialize(a,b)
@a = a
@b = b
end

x = [Foo.new(:a, :b), Foo.new(:c, :d), Foo.new(:a, :b)]
p x
p x.uniq

ruby tst.rb
[#<Foo:0x3b6128 @a=:a, @b=:b>, #<Foo:0x3b6114 @a=:c, @b=:d>,
#<Foo:0x3b6100 @a=:a, @b=:b>]
[#<Foo:0x3b6128 @a=:a, @b=:b>, #<Foo:0x3b6114 @a=:c, @b=:d>,
#<Foo:0x3b6100 @a=:a, @b=:b>]

I tried defining eq? and hash and uniq still fails. hash returns
identical values for objects with identical content and eq? returns
true in this case, but uniq does not remove them.

--
my religion is very simple. my religion is kindness. -- the dalai lama