Sets, uniqueness not unique

I have been splitting a comma separated values file, and putting
some of the values into an Student class, simply a collection of strings,
so that I can build a database table from them:

require 'set'

class Student
   attr_accessor :forename, :surname, :birth_dt,
     :picture, :coll_status
   def initialize(forename0, surname0, birth_dt0,
                  picture0, coll_status0)
     @forename = forename0
     @surname = surname0
     @birth_dt = birth_dt0
     @picture = picture0
     puts "in student.new() picture is #{picture0.inspect}, @picture is #{@picture.inspect} " if $debug
     @coll_status = coll_status0
   end

   def eql?(other)
     # if self.forename == "John" and other.forename == "John"
       debug = true
     # end
     res = [:forename, :surname, :birth_dt, :picture, :coll_status].all? do |msg|
       print "#{self.send(msg)} == #{(other.send(msg))} gives #{self.send(msg) == (other.send(msg))}" if debug
       self.send(msg) == (other.send(msg))
     end
     return res
   end

   def to_s
     "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}"
   end
end

And in the body of my program I read the records in from the csv and
add the students if they are new. They tend to be clustered in the
input, hence the last_student test.

class TableMaker
   INPUT = "hugh.csv"

   ACCEPTED_MODULES = /^\"TECH(100[1-7]|200\d|201[01]|300\d|301[0-2])/

   # Read in the database and populate the tables.
   def initialize(input=INPUT)

     @students = Set.new()
     # [...]
     open(input, 'r') do |infp|
       while record = infp.gets
         record.chomp!
         puts "record is #{record}"
         forename, surname, birth_dt, institution_id, aos_code,
           various, other, fields,
           picture, coll_status, full_desc = record.split(/\s*\|\s*/)

         next unless aos_code =~ ACCEPTED_MODULES

         puts "from record, picture is [#{picture.inspect}]." if $debug
         # Structures for student
         student = Student.new(forename, surname, birth_dt, picture, coll_status)
         if student == last_student
           student = last_student
         else
           student.freeze

           # Avoid duplicates
           unless @students.include? student
             @students.add student
           end
           last_student = student
         end
         # [...]
       end
     end
   end

   # [...]

end

This being a Set I don't really need the call to include? now, but
it's there (from when I was using a hash for this).

I find two things that seem odd to me:

1. eql? is never getting called, despite include?.

2. I end up with duplicate students.

Sets *can't* hold duplicates, and include depends on eql? for Sets.
So what's going on? I have checked, and the duplicate students seem to
have identical strings, so I wrote the eql? to be sure.

I bet this will be a self.kick(self) reason, but I can't see it yet.

         Thank you,
         Hugh

Hi --

I have been splitting a comma separated values file, and putting
some of the values into an Student class, simply a collection of strings,
so that I can build a database table from them:

[...]

         picture, coll_status, full_desc = record.split(/\s*\|\s*/)

I notice you mentioned comma separation but you're splitting on a
pipe. I don't know if this is related to the problem, but I thought
I'd flag it just in case.

Can you provide a couple of sample lines of data?

David

···

On Wed, 14 Sep 2005, Hugh Sasse wrote:

--
David A. Black
dblack@wobblini.net

require 'set'

class Student
attr_accessor :forename, :surname, :birth_dt,
   :picture, :coll_status
def initialize(forename0, surname0, birth_dt0,
                picture0, coll_status0)
   @forename = forename0
   @surname = surname0
   @birth_dt = birth_dt0
   @picture = picture0
   puts "in student.new() picture is #{picture0.inspect}, @picture is #{@picture.inspect} " if $debug
   @coll_status = coll_status0
end

def eql?(other)
   # if self.forename == "John" and other.forename == "John"
     debug = true
   # end
   res = [:forename, :surname, :birth_dt, :picture, :coll_status].all? do >msg>
     print "#{self.send(msg)} == #{(other.send(msg))} gives #{self.send(msg) == (other.send(msg))}" if debug
     self.send(msg) == (other.send(msg))
   end
   return res
end

def to_s
   "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}"
end
end

well this works:

   s0 = Student::new 'a', 'b', 'c', 'd', 'e'
   s1 = Student::new 'a', 'b', 'c', 'd', 'e'
   p(s0.eql?(s1)) #=> true

but this doesn't

   p s0 == s1 #=> false

And in the body of my program I read the records in from the csv and
add the students if they are new. They tend to be clustered in the
input, hence the last_student test.

class TableMaker
INPUT = "hugh.csv"

ACCEPTED_MODULES = /^\"TECH(100[1-7]|200\d|201[01]|300\d|301[0-2])/

# Read in the database and populate the tables.
def initialize(input=INPUT)

   @students = Set.new()
   # [...]
   open(input, 'r') do |infp|
     while record = infp.gets
       record.chomp!

try : record.strip!

       puts "record is #{record}"
       forename, surname, birth_dt, institution_id, aos_code,
         various, other, fields,
         picture, coll_status, full_desc = record.split(/\s*\|\s*/)

or
       fields = record.split(%r/\|/).map{|field| field.strip}
       forename, surname, birth_dt, institution_id, aos_code,
       various, other, fields,
       picture, coll_status, full_desc =

if you don't do one of these two things the either

   - forname may have leading space
   - full_desc may have trailing space

that's because chomp! only blows away trailing newline - not extraneous
spaces and leading space on record is never dealt with.

       next unless aos_code =~ ACCEPTED_MODULES

       puts "from record, picture is [#{picture.inspect}]." if $debug
       # Structures for student
       student = Student.new(forename, surname, birth_dt, picture, coll_status)
       if student == last_student

so, as shown above, this (==) does not work

         student = last_student
       else
         student.freeze

         # Avoid duplicates
         unless @students.include? student
           @students.add student
         end
         last_student = student
       end
       # [...]
     end
   end
end

# [...]

end

This being a Set I don't really need the call to include? now, but
it's there (from when I was using a hash for this).

I find two things that seem odd to me:

1. eql? is never getting called, despite include?.

set uses Object#hash - so maybe something like (untested)

   class Student
     def hash
       %w( forename surname birth_dt picture coll_status).inject(0){|n,m| n += send(m).hash}
     end
   end

i dunno if this will wrap and cause issues though...

if so maybe something like

   class Student
     def hash
       %w( forename surname birth_dt picture coll_status).map{|m| send %m}.join.hash
     end
   end

or, perhaps simple something like:

   class Student < ::Hash
     FIELDS = %w( forename surname birth_dt picture coll_status )
     def initialize(*fs)
       FIELDS.each do |f|
         self[f] = (fs.shift || raise(ArgumentError, "no #{ f }!"))
       end
     end
     def eql? other
       values == other.values
     end
     alias == eql?
     def keys
       FIELDS
     end
     def values
       values_at(*FIELDS)
     end
     def hash
       FIELDS.map{|m| self[m]}.join.hash
     end
   end

   s0 = Student::new 'a', 'b', 'c', 'd', 'e'
   s1 = Student::new 'a', 'b', 'c', 'd', 'e'

   require 'set'
   set = Set::new
   set.add s0
   set.add s1
   p set #=> #<Set: {{"forename"=>"a", "coll_status"=>"e", "birth_dt"=>"c", "picture"=>"d", "surname"=>"b"}}>

the FIELDS const can be used to do ordered prints, etc.

it sure seems odd that set doesn't use 'eql?' or '==' up front though doesn't
it?

-a

···

On Wed, 14 Sep 2005, Hugh Sasse wrote:
--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================

Hi --

This being a Set I don't really need the call to include? now, but
it's there (from when I was using a hash for this).

I find two things that seem odd to me:

1. eql? is never getting called, despite include?.

2. I end up with duplicate students.

Sets *can't* hold duplicates, and include depends on eql? for Sets.

Are you sure about that latter point? In set.rb:

   def include?(o)
     @hash.include?(o)
   end

and in hash.c:

     if (st_lookup(RHASH(hash)->tbl, key, 0)) {
         return Qtrue;
     ... }

I haven't followed the trail beyond that... but I think any two
student objects will count as different hash keys, even if they have
similar string data.

David

···

On Wed, 14 Sep 2005, Hugh Sasse wrote:

--
David A. Black
dblack@wobblini.net

Hi --

I have been splitting a comma separated values file, and putting
some of the values into an Student class, simply a collection of strings,
so that I can build a database table from them:

[...]

         picture, coll_status, full_desc = record.split(/\s*\|\s*/)

I notice you mentioned comma separation but you're splitting on a
pipe. I don't know if this is related to the problem, but I thought
I'd flag it just in case.

Yes, sorry, I was using the generic term for this, to facilitate
explaining the concept of what I was doing. The data I
get is pipe(|) separated.

Can you provide a couple of sample lines of data?

Not really, this is data about real people, and data protection law
means I can't. But I can tell you that the splitting works fine, the
selection of fields for the student works correctly, the students
don't end up with data from other fields, and the nature of the
split command means that we can be sure they are all Strings.

Therefore I think it boils down to:

How can two collections of strings appear to be the same and yet
both of them end up in the Set structure? Whitespace is always
white, be it tab or space, so that's one way, but I still think that
should look different to == or to eql?

David

         Thank you,
         Hugh

···

On Wed, 14 Sep 2005, David A. Black wrote:

On Wed, 14 Sep 2005, Hugh Sasse wrote:

require 'set'

class Student
attr_accessor :forename, :surname, :birth_dt,
   :picture, :coll_status
def initialize(forename0, surname0, birth_dt0,

         [...]

end

def eql?(other)

         [...]

end

def to_s
   "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}"
end
end

well this works:

s0 = Student::new 'a', 'b', 'c', 'd', 'e'
s1 = Student::new 'a', 'b', 'c', 'd', 'e'
p(s0.eql?(s1)) #=> true

but this doesn't

p s0 == s1 #=> false

Hmmm. Yes, I should have more unit tests!

And in the body of my program I read the records in from the csv and

(well. pipe separated -- see other reply :-))

add the students if they are new. They tend to be clustered in the
input, hence the last_student test.

class TableMaker

         [...]

def initialize(input=INPUT)

         [...]

   open(input, 'r') do |infp|
     while record = infp.gets
       record.chomp!

try : record.strip!

       puts "record is #{record}"
       forename, surname, birth_dt, institution_id, aos_code,
         various, other, fields,
         picture, coll_status, full_desc = record.split(/\s*\|\s*/)

or
     fields = record.split(%r/\|/).map{|field| field.strip}
     forename, surname, birth_dt, institution_id, aos_code,
     various, other, fields,
     picture, coll_status, full_desc =

I think the former may be faster, but I'll look into these, thanks.

if you don't do one of these two things the either

- forname may have leading space
- full_desc may have trailing space

Yes, I'd missed that.

that's because chomp! only blows away trailing newline - not extraneous
spaces and leading space on record is never dealt with.

       next unless aos_code =~ ACCEPTED_MODULES

       puts "from record, picture is [#{picture.inspect}]." if $debug
       # Structures for student
       student = Student.new(forename, surname, birth_dt, picture, coll_status)
       if student == last_student

so, as shown above, this (==) does not work

OK, I'll just lose optimisation, but thanks.

         student = last_student
       else
         student.freeze

         # Avoid duplicates
         unless @students.include? student
           @students.add student
         end
         last_student = student

         [...]

This being a Set I don't really need the call to include? now, but
it's there (from when I was using a hash for this).

I find two things that seem odd to me:

1. eql? is never getting called, despite include?.

set uses Object#hash - so maybe something like (untested)

class Student
   def hash
     %w( forename surname birth_dt picture coll_status).inject(0){|n,m| n += send(m).hash}
   end
end

i dunno if this will wrap and cause issues though...

Nor me.

if so maybe something like

class Student
   def hash
     %w( forename surname birth_dt picture coll_status).map{|m| send %m}.join.hash
   end
end

Yes, that seems safer

or, perhaps simple something like:

class Student < ::Hash
   FIELDS = %w( forename surname birth_dt picture coll_status )

         [...]

end

s0 = Student::new 'a', 'b', 'c', 'd', 'e'
s1 = Student::new 'a', 'b', 'c', 'd', 'e'

require 'set'
set = Set::new
set.add s0
set.add s1
p set #=> #<Set: {{"forename"=>"a", "coll_status"=>"e", "birth_dt"=>"c", "picture"=>"d", "surname"=>"b"}}>

the FIELDS const can be used to do ordered prints, etc.

Yes, I might factor that in to my current solution. I didn't want
to allow just any keys, so that's why I didn't subclass Hash, but
it's an interesting approach.

it sure seems odd that set doesn't use 'eql?' or '==' up front though doesn't
it?

Probably a reason I don't know about. The Pickaxe II says it uses
eql? and hash (p731) but doesn't say where.

-a
--

         Thank you for such a full response,
         Hugh.

···

On Wed, 14 Sep 2005, Ara.T.Howard wrote:

On Wed, 14 Sep 2005, Hugh Sasse wrote:

Hi --

Sets *can't* hold duplicates, and include depends on eql? for Sets.

Are you sure about that latter point? In set.rb:

Yes I was, but it turns out that it was with the certainty that comes
before falling flat on one's face. I remembered seeing it in the ri
docs, and sure enough, it isn't there!

[What was that fidonet .sig? "Open mouth, insert foot, echo
internationally"? :-)]

def include?(o)
   @hash.include?(o)
end

and in hash.c:

   if (st_lookup(RHASH(hash)->tbl, key, 0)) {
       return Qtrue;
   ... }

I haven't followed the trail beyond that... but I think any two
student objects will count as different hash keys, even if they have
similar string data.

Which would explain a lot. Thank you. Ara's hash function should
fix this for me.

David

         Thank you,
         Hugh

···

On Wed, 14 Sep 2005, David A. Black wrote:

On Wed, 14 Sep 2005, Hugh Sasse wrote:

Right, there is some definite wierdness going on here. I removed
the definition of eql? and set the hash to use MD5 sums. I still
didn't get unique entries in my set. Now I have

require 'md5'

class Student
         # [...]
   FIELDS = [:forename, :surname, :birth_dt, :picture, :coll_status]
   def initialize(forename0, surname0, birth_dt0,
                  picture0, coll_status0)
         # [...]
     @hash = FIELDS.inject(MD5.new()) do |d,m|
       d << send(m)
     end.hexdigest.hex
   end

   def hash
     @hash
   end

   def eql?(other)
     self.hash == other.hash
   end

end

And this works. Remmove the definition of eql? and include? always
gives untrue (I've not checked to see if it is nil or false).

This is in accordance with the entry in Pickaxe2 (page 570,
Object#hash) and ri, that:
------------------------------------------------------------ Object#hash
      obj.hash => fixnum

···

On Wed, 14 Sep 2005, David A. Black wrote:

Hi --

On Wed, 14 Sep 2005, Hugh Sasse wrote:

This being a Set I don't really need the call to include? now, but
it's there (from when I was using a hash for this).

I find two things that seem odd to me:

1. eql? is never getting called, despite include?.

2. I end up with duplicate students.

Sets *can't* hold duplicates, and include depends on eql? for Sets.

Are you sure about that latter point? In set.rb:

def include?(o)
   @hash.include?(o)
end

and in hash.c:

   if (st_lookup(RHASH(hash)->tbl, key, 0)) {
       return Qtrue;
   ... }

I haven't followed the trail beyond that... but I think any two
student objects will count as different hash keys, even if they have
similar string data.

David

------------------------------------------------------------------------
      Generates a +Fixnum+ hash value for this object. This function must
      have the property that +a.eql?(b)+ implies +a.hash == b.hash+. The
      hash value is used by class +Hash+. Any hash value that exceeds the
      capacity of a +Fixnum+ will be truncated before being used.

(I'm not sure if my digests are too big)

What i don't really know is what the sufficient conditions are for
this? Is it *necessary* to change hash and eql together? What are the
defaults for Set?

I suspect that my eql? ought to be

   def eql?(other)
     FIELDS.inject(true) do |b,v|
       t && (self.send(m) == other.send(m))
     end
   end

for that matter

         Hugh

>>This being a Set I don't really need the call to include? now, but
>>it's there (from when I was using a hash for this).
>>
>>I find two things that seem odd to me:
>>1. eql? is never getting called, despite include?.
>>2. I end up with duplicate students.
>>
>>Sets *can't* hold duplicates, and include depends on eql? for Sets.
>
>Are you sure about that latter point? In set.rb:
>
> def include?(o)
> @hash.include?(o)
> end
>
>and in hash.c:
>
> if (st_lookup(RHASH(hash)->tbl, key, 0)) {
> return Qtrue;
> ... }
>
>I haven't followed the trail beyond that... but I think any two
>student objects will count as different hash keys, even if they have
>similar string data.

object.c:

VALUE
rb_obj_id(VALUE obj)
{
    if (SPECIAL_CONST_P(obj)) {
        return LONG2NUM((long)obj);
    }
    return (VALUE)((long)obj|FIXNUM_FLAG);
}
[...]
rb_define_method(rb_mKernel, "hash", rb_obj_id, 0);

[...]

What i don't really know is what the sufficient conditions are for
this? Is it *necessary* to change hash and eql together? What are the
defaults for Set?

The defaults are actually those of Hash. You can follow the call chain
starting from

static struct st_hash_type objhash = {
    rb_any_cmp,
    rb_any_hash,
};

in hash.c. For user-defined classes, it will end up using #hash and #eql?
defined in Kernel. [rb_any_cmp and rb_any_hash have some extra logic for
Symbol, Fixnum and String values, and some core classes redefine the
associated methods].

Given the above definition of Kernel#hash, if you redefine it, you'll
most probably want to change #eql? too (see below). As far as Hash
objects (and hence Sets) are concerned, modifying #eql? while keeping
#hash unchanged would be effectless (unless you restrict it further so
that obj.eql?(obj) is false, which doesn't seem quite right).

static VALUE
rb_obj_equal(VALUE obj1, VALUE obj2)
{
    if (obj1 == obj2) return Qtrue;
    return Qfalse;
}

[...]
rb_define_method(rb_mKernel, "eql?", rb_obj_equal, 1);

···

On Wed, Sep 14, 2005 at 09:14:10PM +0900, Hugh Sasse wrote:

On Wed, 14 Sep 2005, David A. Black wrote:
>On Wed, 14 Sep 2005, Hugh Sasse wrote:

--
Mauricio Fernandez

Hugh Sasse <hgs@dmu.ac.uk> writes:

if so maybe something like

class Student
   def hash
     %w( forename surname birth_dt picture coll_status).map{|m| send
%m}.join.hash
   end
end

Yes, that seems safer

This seems to be the canonical way to define compund hashes:

class Student
  def hash
    [@forename, @surname, @birth_dt, @picture, @coll_status].hash
  end
end

it sure seems odd that set doesn't use 'eql?' or '==' up front though doesn't
it?

Probably a reason I don't know about. The Pickaxe II says it uses
eql? and hash (p731) but doesn't say where.

Set uses a Hash to store the objects.

That said, I think it would be nice to have something along this in
the stdlib:

class Student
  equal_compares :@forename, :@surname, :@birth_dt, :@picture, :@coll_status
end

Above call should result in appropriate definitions of ==, eql? and
hash. (Something like "ordered_by" would be pretty useful too.)

···

Hugh.

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

OK, it seems I'm thinking along the right lines now. Here is what I
did in the end:

--- /tmp/T0oTa4V2 Wed Sep 14 15:05:35 2005
+++ populate_tables.rb Wed Sep 14 15:01:24 2005
@@ -9,10 +9,31 @@

  $debug = true

+module StringCollection

···

On Wed, 14 Sep 2005, Mauricio Fernández wrote:

The defaults are actually those of Hash. You can follow the call chain
starting from

static struct st_hash_type objhash = {
   rb_any_cmp,
   rb_any_hash,
};

in hash.c. For user-defined classes, it will end up using #hash and #eql?
defined in Kernel. [rb_any_cmp and rb_any_hash have some extra logic for
Symbol, Fixnum and String values, and some core classes redefine the
associated methods].

+
+ def hash
+ (self.class)::FIELDS.inject(MD5.new()) do |d,m|
+ d << send(m)
+ end.hexdigest.hex
+ end
+
+ def eql?(other)
+ (self.class)::FIELDS.inject(true) do |b,v|
+ begin
+ b && (self.send(v) == other.send(v))
+ rescue
+ b = false
+ end
+ end
+ end
+
+end
+
  class Student
- attr_accessor :forename, :surname, :birth_dt,
- :picture, :coll_status
+ include StringCollection
+
    FIELDS = [:forename, :surname, :birth_dt, :picture, :coll_status]
+ FIELDS.each{|f| attr_accessor f }

    def initialize(forename0, surname0, birth_dt0,
                   picture0, coll_status0)
@@ -22,28 +43,22 @@
      @picture = picture0
      puts "in student.new() picture is #{picture0.inspect}, @picture is #{@picture.inspect} " if $debug
      @coll_status = coll_status0
- @hash = FIELDS.inject(MD5.new()) do |d,m|
- d << send(m)
- end.hexdigest.hex
    end

- def hash
- @hash
- end

- def eql?(other)
- self.hash == other.hash
- end
-
    def to_s
- "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}, #{@hash}"
+ "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}, #{hash}"
    end

  end

  class CourseModule
- attr_accessor :aos_code, :dept_code, :aos_type, :full_desc

+ include StringCollection
+
+ FIELDS = [:aos_code, :dept_code, :aos_type, :full_desc]
+ FIELDS.each{|f| attr_accessor f }
+
    def initialize( aos_code, dept_code, aos_type, full_desc)
      @aos_code = aos_code
      @dept_code = dept_code

I was particularly pleased to be able not to repeat the FIELDS, by
means of attr_accessor, and that the idea of doing
(self.class)::FIELDS
actually worked.

In the hope that this helps someone else, and thank you,
         Hugh

This seems to be the canonical way to define compund hashes:

class Student
def hash
   [@forename, @surname, @birth_dt, @picture, @coll_status].hash
end
end

That does seem to preserve the properties I need for strings, and is
probably cheaper than MD5sums.

         [...]

Set uses a Hash to store the objects.

That said, I think it would be nice to have something along this in
the stdlib:

class Student
equal_compares :@forename, :@surname, :@birth_dt, :@picture, :@coll_status
end

Above call should result in appropriate definitions of ==, eql? and

I don't know how it could know how to create the different
definitions correctly given a completely open spec as to what the
vars are.

hash. (Something like "ordered_by" would be pretty useful too.)

I think that could be tricky too.

Thank you.
         Hugh

···

On Thu, 15 Sep 2005, Christian Neukirchen wrote:

OK, it seems I'm thinking along the right lines now. Here is what I did in
the end:

< snip code >

I was particularly pleased to be able not to repeat the FIELDS, by means of
attr_accessor, and that the idea of doing (self.class)::FIELDS actually
worked.

i do alot of that type of thing and use my traits lib a lot for it - it can
make it pretty compact. for instance:

   harp:~ > cat a.rb
   require 'md5'
   require 'traits'

   module TraitCollection
     def initialize(*list)
       list = [ list ].flatten
       wt.each_with_index do |t,i|
         v = list[i] or
           raise ArgumentError, "no <#{ t }> given in <#{ list.inspect }>!"
         send t, v
       end
     end
     def to_s
       (rt.map{|t| [t, send(t)].join '='}. << "hash=#{ hash }").inspect
     end
     alias inspect to_s
     def hash
       rt.inject(::MD5::new()){|d,m| d << send(m)}.hexdigest.hex
     end
     def eql?(other)
       rt.inject(true){|b,v| b && (send(v) == other.send(v)) rescue false}
     end
     def wt; self::class::writer_traits; end
     def rt; self::class::reader_traits; end
     def self::included other
       super; class << other; class << self; alias new; end; end
     end
   end

   class Student
     include TraitCollection
     traits *%w( forename surname birth_dt picture coll_status )
   end
   class Course
     include TraitCollection
     traits *%w( aos_code dept_code aos_type full_desc )
   end

   require 'set'

   sset = Set::new
   s0, s1 = Student[%w( a b c d e )], Student[%w( f g h i j )]
   sset.add s0
   42.times{ sset.add s1 }
   p sset

   cset = Set::new
   c0, c1 = Course[%w( a b c d )], Course[%w( e f g h )]
   cset.add c0
   42.times{ cset.add c1 }
   p cset

   harp:~ > ruby a.rb
   #<Set: {["forename=a", "coll_status=b", "birth_dt=c", "picture=d", "surname=e", "hash=227748192848680293725464448333830731654"], ["forename=f", "coll_status=g", "birth_dt=h", "picture=i", "surname=j", "hash=116663401890982171087417074910604104991"]}>

   #<Set: {["dept_code=a", "full_desc=b", "aos_code=c", "aos_type=d", "hash=301716283811389038011477436469853762335"], ["dept_code=e", "full_desc=f", "aos_code=g", "aos_type=h", "hash=41821698252824551223787888325781077799"]}>

cheers.

-a

···

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna

===============================================================================

Just one minor comment:

batsman@tux-chan:~$ cat /tmp/fdsfdsdsd.rb
class Foo
  FIELDS = %w[name stuff foo bar]
  attr_reader(*FIELDS)

  def initialize(name, stuff, foo, bar)
    @name, @stuff, @foo, @bar = name, stuff, foo, bar
  end

  def eql1?(other)
    (self.class)::FIELDS.inject(true) do |b,v|
      begin
        b && (self.send(v) == other.send(v))
      rescue
        b = false
      end
    end
  end

  def eql2?(other)
    # maybe add self.class::FIELDS == other.class::FIELDS test plus rescue NameError ?
    self.class::FIELDS.each{|m| break false if self.send(m) != other.send(m) } && true
  rescue NoMethodError
    false
  end
end

require 'benchmark'
a = Foo.new("a", "b", "c", "d")
b = Foo.new("e", "b", "c", "d")
c = Foo.new("a", "b", "c", "e")

TIMES = 100000
%w[a b c].each{|x| puts "#{x} = #{eval(x).inspect}"}
Benchmark.bmbm do |x|
  %w[a b c].each do |o|
    %w[eql1? eql2?].each do |m|
      s = "a.#{m}(#{o})"
      x.report("#{s}: #{eval(s)}") { eval("TIMES.times{#{s}}") }
    end
  end
end
batsman@tux-chan:~$ ruby -v /tmp/fdsfdsdsd.rb
ruby 1.8.3 (2005-05-22) [i686-linux]
a = #<Foo:0xb7dc9c98 @name="a", @bar="d", @foo="c", @stuff="b">
b = #<Foo:0xb7dc9c20 @name="e", @bar="d", @foo="c", @stuff="b">
c = #<Foo:0xb7dc9ba8 @name="a", @bar="e", @foo="c", @stuff="b">
Rehearsal -----------------------------------------------------
a.eql1?(a): true 1.520000 0.000000 1.520000 ( 1.658224)
a.eql2?(a): true 0.880000 0.000000 0.880000 ( 0.970675)
a.eql1?(b): false 1.070000 0.000000 1.070000 ( 1.156081)
a.eql2?(b): false 0.360000 0.010000 0.370000 ( 0.410011)
a.eql1?(c): false 1.570000 0.000000 1.570000 ( 1.734145)
a.eql2?(c): false 0.910000 0.000000 0.910000 ( 1.003833)
-------------------------------------------- total: 6.320000sec

                        user system total real
a.eql1?(a): true 1.510000 0.010000 1.520000 ( 1.679369)
a.eql2?(a): true 0.890000 0.000000 0.890000 ( 0.950153)
a.eql1?(b): false 1.100000 0.010000 1.110000 ( 1.200057)
a.eql2?(b): false 0.360000 0.000000 0.360000 ( 0.383755)
a.eql1?(c): false 1.560000 0.010000 1.570000 ( 1.739114)
a.eql2?(c): false 0.920000 0.000000 0.920000 ( 0.978109)

···

On Wed, Sep 14, 2005 at 11:12:16PM +0900, Hugh Sasse wrote:

+ def eql?(other)
+ (self.class)::FIELDS.inject(true) do |b,v|
+ begin
+ b && (self.send(v) == other.send(v))
+ rescue
+ b = false
+ end
+ end
+ end

--
Mauricio Fernandez

Hugh Sasse <hgs@dmu.ac.uk> writes:

This seems to be the canonical way to define compund hashes:

class Student
def hash
   [@forename, @surname, @birth_dt, @picture, @coll_status].hash
end
end

That does seem to preserve the properties I need for strings, and is
probably cheaper than MD5sums.

         [...]

Set uses a Hash to store the objects.

That said, I think it would be nice to have something along this in
the stdlib:

class Student
equal_compares :@forename, :@surname, :@birth_dt, :@picture, :@coll_status
end

Above call should result in appropriate definitions of ==, eql? and

I don't know how it could know how to create the different
definitions correctly given a completely open spec as to what the
vars are.

Well, you just list all instance variables that define the
object... if they are the same, the objects are eql?.

hash. (Something like "ordered_by" would be pretty useful too.)

I think that could be tricky too.

In the end, [*fields] <=> [*other.fields] does the job.

···

On Thu, 15 Sep 2005, Christian Neukirchen wrote:

Thank you.
         Hugh

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Just one minor comment:

batsman@tux-chan:~$ cat /tmp/fdsfdsdsd.rb
class Foo
FIELDS = %w[name stuff foo bar]
attr_reader(*FIELDS)

That's rather nice :slight_smile:
         [...]

def eql2?(other)
   # maybe add self.class::FIELDS == other.class::FIELDS test plus rescue NameError ?

Good point.

   self.class::FIELDS.each{|m| break false if self.send(m) != other.send(m) } && true

Nice optimisation! I was having enough of a job keeping my head
around inject to think of that!
         [...]

Rehearsal -----------------------------------------------------
a.eql1?(a): true 1.520000 0.000000 1.520000 ( 1.658224)
a.eql2?(a): true 0.880000 0.000000 0.880000 ( 0.970675)

         [and similar]

That makes quite a difference. Thank you.

--
Mauricio Fernandez

         Hugh

···

On Thu, 15 Sep 2005, Mauricio Fernández wrote:

Christian Neukirchen wrote:

Hugh Sasse <hgs@dmu.ac.uk> writes:

This seems to be the canonical way to define compund hashes:

class Student
def hash
   [@forename, @surname, @birth_dt, @picture, @coll_status].hash
end
end

That does seem to preserve the properties I need for strings, and is
probably cheaper than MD5sums.

         [...]

Set uses a Hash to store the objects.

That said, I think it would be nice to have something along this in
the stdlib:

class Student
equal_compares :@forename, :@surname, :@birth_dt, :@picture,
:@coll_status end

Above call should result in appropriate definitions of ==, eql? and

I don't know how it could know how to create the different
definitions correctly given a completely open spec as to what the
vars are.

Well, you just list all instance variables that define the
object... if they are the same, the objects are eql?.

hash. (Something like "ordered_by" would be pretty useful too.)

I think that could be tricky too.

In the end, [*fields] <=> [*other.fields] does the job.

You can also steal the code from RCR 293 for a general solution:
http://rcrchive.net/rcr/show/293

Kind regards

    robert

···

On Thu, 15 Sep 2005, Christian Neukirchen wrote:

Hmm, that's interesting, but I don't get:

code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
") << " end\n"

Shouldn't hash return a Fixnum?

------------------------------------------------------------ Object#hash
      obj.hash => fixnum

···

On Fri, 16 Sep 2005, Robert Klemme wrote:

You can also steal the code from RCR 293 for a general solution:
RCR 293: Easy definition of key and sort attributes of a class

------------------------------------------------------------------------
      Generates a +Fixnum+ hash value for this object. This function must
      have the property that +a.eql?(b)+ implies +a.hash == b.hash+. The
      hash value is used by class +Hash+. Any hash value that exceeds the
      capacity of a +Fixnum+ will be truncated before being used.

The function above appears to return a string with numbers separated
by " ^ ".

Kind regards

   robert

         Thank you,
         Hugh

Hugh Sasse wrote:

You can also steal the code from RCR 293 for a general solution:
RCR 293: Easy definition of key and sort attributes of a class

Hmm, that's interesting, but I don't get:

code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
") << " end\n"

Shouldn't hash return a Fixnum?

Definitely!

------------------------------------------------------------
      Object#hash obj.hash => fixnum
------------------------------------------------------------------------
      Generates a +Fixnum+ hash value for this object. This function
      must have the property that +a.eql?(b)+ implies +a.hash ==
      b.hash+. The hash value is used by class +Hash+. Any hash value
      that exceeds the capacity of a +Fixnum+ will be truncated
before being used.

The function above appears to return a string with numbers separated
by " ^ ".

Nope. The join appears during code generation and not during evaluation
of the method. You can easily verify this by printing code after it's
completed. :slight_smile:

Kind regards

    robert

···

On Fri, 16 Sep 2005, Robert Klemme wrote:

Hugh Sasse wrote:

Hmm, that's interesting, but I don't get:

code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
") << " end\n"

Shouldn't hash return a Fixnum?

Definitely!

         [...]

The function above appears to return a string with numbers separated
by " ^ ".

Nope. The join appears during code generation and not during evaluation
of the method. You can easily verify this by printing code after it's
completed. :slight_smile:

Oh, then it's exclusive or. I'm clearly being as sharp as a sponge
today.

While my brain is behaving like cottage cheese, it's probably not
the time to ask how one might guarantee that you don't stomp on the
hashes of other ojects in the system. If you have an even number of
elements, all the same Fixnum, like [1,1,1,1] then they would hash
to 0, as would [2,2], I "think".
irb(main):004:0> [1,1].inject(0) { |a,b| a ^= b.hash}
=> 0
irb(main):005:0> [2,1,1,2].inject(0) { |a,b| a ^= b.hash}
=> 0
irb(main):006:0>

Kind regards

   robert

         Hugh

···

On Fri, 16 Sep 2005, Robert Klemme wrote: