Code too hashy, I think

Giles_Bowkett · 28 March 2007 17:31

I came to Ruby from a scattering of Java, a tiny bit of Python, and a
ton of Perl. So the other day I wrote a simple utility script and
afterwards realized it may have been very influenced by all those
years of Perl. Specifically, it seemed to use hashes too much.

How can I cure myself of my hash addiction?

Seriously -- I had a CSV file from an Excel spreadsheet. Each row
represented a thing, so I made a Struct for that thing. Those things
were in groups, so I hashed them by their common feature, a unique id
from the spreadsheet. But the last character in the unique id on the
spreadsheet was non-unique, and had to be preserved. Each group had to
have a dash-delimited list of those characters, relevant to that
particular group (since different groups would have different terminal
characters in their unique ids). So I hashed them again -- and that's
what seemed like overkill.

What's the most idiomatic Ruby data structure? Would another Struct
have been more maintainable? The code was perfectly functional, and
it's pretty readable, but you can read it and go, oh, this is a
recovering Perl hacker. I picked it up from another programmer I work
with, and looking over what he had initially, I was able to go, oh,
this is a recovering PHP hacker. I want my code to look like, oh, this
is a Ruby hacker. How do I hide my shameful past?

···

--
Giles Bowkett
http://www.gilesgoatboy.org

Andrew_Libby · 28 March 2007 18:47

Hi Giles,

I'm a new ruby convert (from primarily Perl, but also Java and PHP
myself). I'd like to offer my observation and see if other more
experienced Ruby programmers have input.

To risk sounding trite, my take on the most idiomatic Ruby data
structure would be the class. When I deal with CSVs I typically
use the CSV object in the standard library (though I've read there
may be better ways) and do something like the code below.

At the end of the day, it's a hash cloaked in object clothing. I
use the first line of the CSV to determine the names of the
keys in the hash (and hence the instance method names for the
object instance for each row).

The reason I take this approach is that it avoids locking you to
the hash syntax using or fetch/ store. Additionally, it would
allow you to subclass and have instance methods like:

def full_name
first + " " + last
end

I'm still learning ruby, so feed back on the approach is welcome.

#!/usr/bin/ruby

require 'csv'

class ObjectCsvParser
attr :attributes

    def initialize(fields,values)
        @attributes = Hash[
            *[
                (0..fields.length-1).map {
                    >i> [fields[i], values[i]]
                }
            ].flatten
        ]

end

    def method_missing(method_name)
        if(@attributes[method_name])
            @attributes[method_name]
        else
            raise NoMethodError.new(method_name)
        end
    end

    def self.each_record(fn,&block)
        fields = nil
        CSV.open(fn,'r') do |line|
            if(fields == nil)
                fields = line.map { |l| l.downcase.to_sym }
                next
            end

            yield ObjectCsvParser.new(fields,line)
        end
    end
end

# Assuming your CSV has last, first, and email as fields
ObjectCsvParser.each_record(ARGV.shift) do |record|
    puts "Last: " + record.last
    puts "First: " + record.first
    puts "Email: " + record.email
end

Cheers!

Andy

Giles Bowkett wrote:

···

I came to Ruby from a scattering of Java, a tiny bit of Python, and a
ton of Perl. So the other day I wrote a simple utility script and
afterwards realized it may have been very influenced by all those
years of Perl. Specifically, it seemed to use hashes too much.

How can I cure myself of my hash addiction?

Seriously -- I had a CSV file from an Excel spreadsheet. Each row
represented a thing, so I made a Struct for that thing. Those things
were in groups, so I hashed them by their common feature, a unique id
from the spreadsheet. But the last character in the unique id on the
spreadsheet was non-unique, and had to be preserved. Each group had to
have a dash-delimited list of those characters, relevant to that
particular group (since different groups would have different terminal
characters in their unique ids). So I hashed them again -- and that's
what seemed like overkill.

What's the most idiomatic Ruby data structure? Would another Struct
have been more maintainable? The code was perfectly functional, and
it's pretty readable, but you can read it and go, oh, this is a
recovering Perl hacker. I picked it up from another programmer I work
with, and looking over what he had initially, I was able to go, oh,
this is a recovering PHP hacker. I want my code to look like, oh, this
is a Ruby hacker. How do I hide my shameful past?

--
Andrew Libby
Tangeis, LLC
Innovative IT Management Solutions
alibby@tangeis.com

Brian_Candler · 28 March 2007 19:14

Perhaps you want Struct or OpenStruct (both in the standard install; for the
latter you will need to require 'ostruct.rb')

From /usr/lib/ruby/1.8/ostruct.rb:

# OpenStruct allows you to create data objects and set arbitrary attributes.
# For example:

···

On Thu, Mar 29, 2007 at 03:47:04AM +0900, Andrew Libby wrote:

At the end of the day, it's a hash cloaked in object clothing.

#
# require 'ostruct'
#
# record = OpenStruct.new
# record.name = "John Smith"
# record.age = 70
# record.pension = 300
#
# puts record.name # -> "John Smith"
# puts record.address # -> nil
#
# It is like a hash with a different way to access the data. In fact, it is
# implemented with a hash, and you can initialize it with one.
#
# hash = { "country" => "Australia", :population => 20_000_000 }
# data = OpenStruct.new(hash)
#
# p data # -> <OpenStruct country="Australia" population=20000000>

Andrew_Libby · 28 March 2007 19:45

Nice! That's great Brian, I love it. A quick
fix to the code I posted previously makes use of
this class. My class extends OpenStruct, and no longer
has an @attributes attribute, or a method_missing
method.

Thanks!

#!/usr/bin/ruby

require 'csv'
require 'ostruct'

class ObjectCsvParser < OpenStruct
    def initialize(fields,values)
        super Hash[
            *[
                (0..fields.length-1).map {
                    >i> [fields[i], values[i]]
                }
            ].flatten
        ]

end

    def self.each_record(fn,&block)
        fields = nil
        CSV.open(fn,'r') do |line|
            if(fields == nil)
                fields = line.map { |l| l.downcase.to_sym }
                next
            end

            yield ObjectCsvParser.new(fields,line)
        end
    end
end

# Assuming your CSV has last, first, and email as fields
ObjectCsvParser.each_record(ARGV.shift) do |record|
    puts "Last: " + record.last
    puts "First: " + record.first
    puts "Email: " + record.email
end

Brian Candler wrote:

···

On Thu, Mar 29, 2007 at 03:47:04AM +0900, Andrew Libby wrote:

At the end of the day, it's a hash cloaked in object clothing.

Perhaps you want Struct or OpenStruct (both in the standard install; for the
latter you will need to require 'ostruct.rb')

From /usr/lib/ruby/1.8/ostruct.rb:

# OpenStruct allows you to create data objects and set arbitrary attributes.
# For example:
#
# require 'ostruct'
#
# record = OpenStruct.new
# record.name = "John Smith"
# record.age = 70
# record.pension = 300
#
# puts record.name # -> "John Smith"
# puts record.address # -> nil
#
# It is like a hash with a different way to access the data. In fact, it is
# implemented with a hash, and you can initialize it with one.
#
# hash = { "country" => "Australia", :population => 20_000_000 }
# data = OpenStruct.new(hash)
#
# p data # -> <OpenStruct country="Australia" population=20000000>

--
Andrew Libby
Tangeis, LLC
Innovative IT Management Solutions
alibby@tangeis.com

Brian_Candler · 28 March 2007 20:02

NP. Depending on how dynamic you want to be, it's smaller with Struct:

#!/usr/bin/ruby

require 'csv'

class ObjectCsvParser
    def self.each_record(fn,&block)
        klass = nil
        CSV.open(fn,'r') do |line|
            if klass.nil?
                klass = Struct.new( *line.map { |l| l.downcase.to_sym } )
                next
            end

            yield klass.new(*line)
        end
    end
end

# Assuming your CSV has last, first, and email as fields
ObjectCsvParser.each_record(ARGV.shift) do |record|
    puts "Last: " + record.last
    puts "First: " + record.first
    puts "Email: " + record.email
end

···

On Thu, Mar 29, 2007 at 04:45:37AM +0900, Andrew Libby wrote:

Nice! That's great Brian, I love it. A quick
fix to the code I posted previously makes use of
this class. My class extends OpenStruct, and no longer
has an @attributes attribute, or a method_missing
method.

Thanks!

Giles_Bowkett · 28 March 2007 22:01

Both good solutions. My own approach mostly revolved around Struct,
I'd heard of both CSV and OpenStruct but I was coding in a hurry so I
didn't think to use them.

Thing is, there were elements on each line which affected not the
lines themselves but the groups those lines were part of, and to
collect those, I resorted to an additional hash. So I ended up with a
hash of Structs, indexed by group ID, and a hash of collected
elements, also indexed by group ID. I should have probably just made a
multidimensional hash, but even that seems Perl-y. I think the real
thing to do would have been to create another Struct for the groups,
containing both an array of Structs -- one for each line in the group
-- and the collected-data string as well. That might have been better.

···

--
Giles Bowkett
http://www.gilesgoatboy.org

http://giles.tumblr.com/

James_Edward_Gray_II · 28 March 2007 22:09

Any chance you could show us some trivial example data (just ten lines or so is fine and we only need the key fields) and how you want to access it. We might have better ideas when we see the specifics...

James Edward Gray II

···

On Mar 28, 2007, at 5:01 PM, Giles Bowkett wrote:

Thing is, there were elements on each line which affected not the
lines themselves but the groups those lines were part of, and to
collect those, I resorted to an additional hash. So I ended up with a
hash of Structs, indexed by group ID, and a hash of collected
elements, also indexed by group ID.

Giles_Bowkett · 29 March 2007 00:54

> Thing is, there were elements on each line which affected not the
> lines themselves but the groups those lines were part of, and to
> collect those, I resorted to an additional hash. So I ended up with a
> hash of Structs, indexed by group ID, and a hash of collected
> elements, also indexed by group ID.

Any chance you could show us some trivial example data (just ten
lines or so is fine and we only need the key fields) and how you want
to access it. We might have better ideas when we see the specifics...

I think there's no harm in that. Should be able to this evening.

Since starting this thread, I've looked at another developer's code
and seen very plainly that they were writing Python before they
started writing Ruby. It'd be nice to make my Ruby so idiomatic that
people didn't believe me when I told them I knew other languages. Kind
of like learning to speak a foreign language with a perfect accent.

···

--
Giles Bowkett
http://www.gilesgoatboy.org

http://giles.tumblr.com/

Giles_Bowkett · 29 March 2007 21:44

Any chance you could show us some trivial example data (just ten
lines or so is fine and we only need the key fields) and how you want
to access it. We might have better ideas when we see the specifics...

OK, here's sample data and the script. I changed unique ID numbers and
made some subtle text changes as well to prevent the data from being
an NDA violation.

http://gilesbowkett.com/blog_code_samples/muppet.csv
http://gilesbowkett.com/blog_code_samples/muppetimport.rb

You can probably actually run this code with only a few modifications;
namely, there's stuff which expects Rails models to be defined, you
can comment that out and uncomment the "# reporting!" code, and you're
good to go.

(This was a quick script, done in a hurry, so it's not the best thing
I've ever done.)

···

--
Giles Bowkett
http://www.gilesgoatboy.org

http://giles.tumblr.com/

Pat_Maddox1 · 29 March 2007 06:36

My Ruby is super sexy. Just thought I'd throw that in there. If you
want to form a shrine for me, I won't object.

Pat

···

On 3/28/07, Giles Bowkett <gilesb@gmail.com> wrote:

> > Thing is, there were elements on each line which affected not the
> > lines themselves but the groups those lines were part of, and to
> > collect those, I resorted to an additional hash. So I ended up with a
> > hash of Structs, indexed by group ID, and a hash of collected
> > elements, also indexed by group ID.
>
> Any chance you could show us some trivial example data (just ten
> lines or so is fine and we only need the key fields) and how you want
> to access it. We might have better ideas when we see the specifics...

I think there's no harm in that. Should be able to this evening.

Since starting this thread, I've looked at another developer's code
and seen very plainly that they were writing Python before they
started writing Ruby. It'd be nice to make my Ruby so idiomatic that
people didn't believe me when I told them I knew other languages. Kind
of like learning to speak a foreign language with a perfect accent.

James_Edward_Gray_II · 30 March 2007 00:58

Usually we approach these problems with iterators.

First I show some ways you might select subsets of data. This isn't as fast as Hash based access, but can be useful when you need to be able to view the data several different ways.

If you still need the Hash indexes, I show how I would go about building those next.

My hope is that something in here will give you some fresh ideas:

#!/usr/bin/env ruby -w

require "pp"

$/ = "\r" # switch to the unusual line endings

# read in the data
Stuff = Struct.new(:family, :description, :color, :size_code, :sku)
all_stuff = ARGF.inject(Array.new) do |rows, row|
next rows unless row =~ /\S/
rows.push(Stuff.new(*row.split(/\s*,\s*/)[0..4]))
end

# find all by an sku when needed
pp all_stuff.select { |s| s.sku.include? "35860466" }
puts

# find all entries of a certain size
pp all_stuff.select { |s| s.size_code == "M" }
puts

# or group the data by size
by_size = all_stuff.inject(Hash.new { |size, a| size[a] = }) do |grouped, s|
grouped[s.size_code] << s
grouped
end
pp by_size

__END__

Hope that helps.

James Edward Gray II

···

On Mar 29, 2007, at 4:44 PM, Giles Bowkett wrote:

Any chance you could show us some trivial example data (just ten
lines or so is fine and we only need the key fields) and how you want
to access it. We might have better ideas when we see the specifics...

OK, here's sample data and the script. I changed unique ID numbers and
made some subtle text changes as well to prevent the data from being
an NDA violation.

http://gilesbowkett.com/blog_code_samples/muppet.csv
http://gilesbowkett.com/blog_code_samples/muppetimport.rb

Giles_Bowkett · 29 March 2007 07:30

> Since starting this thread, I've looked at another developer's code
> and seen very plainly that they were writing Python before they
> started writing Ruby. It'd be nice to make my Ruby so idiomatic that
> people didn't believe me when I told them I knew other languages. Kind
> of like learning to speak a foreign language with a perfect accent.

My Ruby is super sexy. Just thought I'd throw that in there. If you
want to form a shrine for me, I won't object.

Thank you, Pat. I'll keep that in mind.

···

--
Giles Bowkett
http://www.gilesgoatboy.org

http://giles.tumblr.com/

Gary_Wright · 30 March 2007 01:27

Set has a nice method, classify, that can help with this if you aren't
concerned with duplicates.

require 'set'

by_size = all_stuff.to_set.classify { |s| s.size_code }

I'm surprised classify isn't part of Enumerable. It is a generalization of
Enumerable#partition.

Gary Wright

···

On Mar 29, 2007, at 8:58 PM, James Edward Gray II wrote:

# or group the data by size
by_size = all_stuff.inject(Hash.new { |size, a| size[a] = }) do |grouped, s|
grouped[s.size_code] << s
grouped
end
pp by_size

Topic		Replies	Views
Call for comments - Structure ruby-talk	13	123	29 April 2012
Creating variables on an OpenStruct with dynamic names ruby-talk	10	137	5 May 2011
How to improve this code? ruby-talk	15	161	23 January 2008
Using ruby hash on array ruby-talk	13	168	22 December 2008
[QUIZ][SOLUTION] Hash to OpenStruct (#81) ruby-talk	5	104	6 June 2006

Code too hashy, I think

Related topics