[QUIZ][SOLUTION] Hash to OpenStruct (#81)

This "solution" feels rather like cheating, but I did discover the
lovely Facets library in the process, which I'm sure I'll be using
again.

···

===
require 'yaml'
require 'ostruct'
require 'rubygems'
require 'facet/hash/to_ostruct_recurse'

ostruct = YAML.load(File.open("example.yaml")).to_ostruct_recurse

--Alison

I have 2 solutions. My first golfish one which doesn't handle any
special cases:

···

----
require 'yaml'
require 'ostruct'
class Hash
  def to_os
    o=OpenStruct.new
    each{|k,v|o.send(k.to_s+'=',v.respond_to?(:to_os) ? v.to_os : v)}
    o
  end
end

if __FILE__ == $0
  p data=YAML::load(ARGF.read).to_os
end

And my second, which attempts to prevent recursion loops and adds a
'_' infront of invalid names or clashes with existing methods:

class Hash
  def to_os
    os = OpenStruct.new
    each {|key,val|
      key = '_'+key.to_s if !key.to_sym ||os.methods.include?(key.to_s)
      key = key.gsub(/[!?]/,'_')
      if val.object_id!=self.object_id
        os.send(key.to_s+'=', val.respond_to?(:to_os) ? val.to_os : val )
      end
    }
    os
  end
end

-Adam

Hi all,

So what I'm really looking for is something that is like the (reasonably
cool) bit-struct library, but with some improvements. I don't mind hacking a
bunch of stuff onto a more generic format / library if I have to.

What I need is a way to rapidly build classes that represent structured,
binary data. Bit-struct only lets me define fields up to 32 bits long, which
is bad, and there isn't (AFAICT) a way to have variable length fields in
mid-structure. As an example I might want to represent a protocol header
which has a Length field followed by some data of the appropriate length,
and then followed by some more fixed structure elements.

I should mention that for my _particular_ needs I need a way to easily set
invalid lengths, too. Could be as simple a obj.variable_value= automatically
updating the length field whereas obj.len_field= sets length manually. An
easy way to marshal / unmarshal (pack unpack, whatever) data according to
field-type would be a bonus. In an ideal world the interface would be very
similar to bit-stuct where you write class definitions which get interpreted
by meta-magic to create a real class with all the accessors ready made, but
with the added ability to define :len_of_field8 as :value_of_field8.length,
and unlimited fixed bit-length fields.

Elegant ideas for the representation of the structure definitions and the
general direction to take for the class construction code would be most
helpful. I've only started hacking with ruby recently, but I'm finding it
fun as hell. I could write a solution for myself, but I figure if I ask I
might get a hint towards doing things a more idiomatic way. :slight_smile:

Cheers,

ben

Ben Nagy wrote:

Hi all,

So what I'm really looking for is something that is like the (reasonably
cool) bit-struct library, but with some improvements. I don't mind hacking a
bunch of stuff onto a more generic format / library if I have to.

What I need is a way to rapidly build classes that represent structured,
binary data. Bit-struct only lets me define fields up to 32 bits long, which
is bad, and there isn't (AFAICT) a way to have variable length fields in
mid-structure. As an example I might want to represent a protocol header
which has a Length field followed by some data of the appropriate length,
and then followed by some more fixed structure elements.

Hi, Ben.

Unsigned and signed integer fields >32 bits (and 1..16, 24, or 32 bits)
are supported since bit-struct-0.8. And of course the various character
field types can be any _fixed_length.

However, you're right about variable length fields that occur somewhere
except at the end of the structure: bit-struct doesn't support them.

For the special case of protocols with (for example) IP option fields,
I've been lucky enough to deal with only a small set of possibilities,
and I just define each of them as a different subclass of IP, with fixed
length fields. I define a #parse class method that looks at the header
length or flags to determine what type it should be, and then outputs an
instance of the appropriate subclass.

If you want to handle variable length embedded fields in a single class,
that's going to be tricky. How long are the length fields? Are the
length fields fixed length? Or are they like netstrings[1]? Encoded as
ascii, big-endian unsigned, or ...?

When you access fields at a higher offset than the beginning of the
var-length field, the accessor will have to read the length and adjust
the offset accordingly. If there is another var-length field, that may
require skipping past that field as well...

Anyway, I have wanted such a field, too, so maybe I will implement one
of these variations someday. Or if anyone has ideas...

[1] http://cr.yp.to/proto/netstrings.txt

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

From: Joel VanderWerf [mailto:vjoel@path.berkeley.edu]

[...]

Ben Nagy wrote:

[...]

> What I need is a way to rapidly build classes that
represent structured,
> binary data. Bit-struct only lets me define fields up to 32
bits long, which
> is bad, and there isn't (AFAICT) a way to have variable
length fields in
> mid-structure. As an example I might want to represent a
protocol header
> which has a Length field followed by some data of the
appropriate length,
> and then followed by some more fixed structure elements.

Hi, Ben.

Hi! Thanks for the response. :slight_smile:

Unsigned and signed integer fields >32 bits (and 1..16, 24,
or 32 bits)
are supported since bit-struct-0.8. And of course the various
character
field types can be any _fixed_length.

Ahh, the latest code I googled when I started was 0.5, and I just wrote my
classes that needed 64 bit ints with a hi and lo part. Thanks for the tip,
I'll grab the newer code.

However, you're right about variable length fields that occur
somewhere
except at the end of the structure: bit-struct doesn't support them.

[...]

If you want to handle variable length embedded fields in a
single class,
that's going to be tricky. How long are the length fields? Are the
length fields fixed length? Or are they like netstrings[1]? Encoded as
ascii, big-endian unsigned, or ...?

In general, the length fields in network protocol headers are going to be
fixed length, from what I've seen. Protocols that need variable length data
all over the place seem to be using ASN.1/PER these days (another library on
my wishlist ;). I can't use ASN.1 for my purposes because I need to break
things before I send them, and the parser classes immediately choke at that
point when trying to marshal the data for sending.

When you access fields at a higher offset than the beginning of the
var-length field, the accessor will have to read the length and adjust
the offset accordingly. If there is another var-length field, that may
require skipping past that field as well...

Have you thought about not using String as the base class? For instance,
OpenStruct would be almost OK for my purposes, if it sustained ordered
output. If I had to hack things up without guidance I would probably start
with a Hash and have :fieldname -> pos, val, type internally. You wouldn't
be able to treat the whole object like a string directly, but overloading
to_s shouldn't be too ugly syntactically? The type definition would still be
used to meta-create a class 'parse' method that does the parsing, to convert
from a raw string (or I guess you could just use o=Class.new(String)).

The trouble is that I'm still getting to grips with the nontrivial parts of
Ruby metaprogramming, so there are a few fiddly details that I'm mentally
glossing over. I _think_, for example, that it would be cool to be able to
define the class fields as using any calculated value at the time of
instantiation. Take UDP for example, where the checksum is performed over a
pseudoheader + payload. That rapidly starts to twist my brain though, since
the UDP object would need to know if it is the payload of an IP object
before being able to calculate the checksum. Gah. Maybe some Proc that is
called when you call o.field.refresh (which gets called the first time
during instantiation)... but then the checksum depends on other calculated
fields like length so it needs to be done last... ok my brain just exploded.
:frowning:

Anyway, I have wanted such a field, too, so maybe I will implement one
of these variations someday. Or if anyone has ideas...

I can't wait. :slight_smile:

Cheers,

ben

···

-----Original Message-----

Ben Nagy wrote:

Have you thought about not using String as the base class? For instance,
OpenStruct would be almost OK for my purposes, if it sustained ordered
output. If I had to hack things up without guidance I would probably start
with a Hash and have :fieldname -> pos, val, type internally. You wouldn't
be able to treat the whole object like a string directly, but overloading
to_s shouldn't be too ugly syntactically? The type definition would still be
used to meta-create a class 'parse' method that does the parsing, to convert
from a raw string (or I guess you could just use o=Class.new(String)).

This is a good point (about String as the base class), and it brings up
the threshold at which bit-struct loses its usefulness. If you're doing
a lot of complex accessor operations (esp. the var-length fields), then
operating on a string just gets hopelessly mucky. It's better to use
some structured data type, and follow the parse->operate->unparse cycle.
BitStruct has been useful in cases where I only need to touch a field or
two and then just pass the string on somewhere else (a socket, a file, a
database, etc.). In these cases, parsing all the fields is a waste of time.

So, what kind of data structure to use...

A hash of fieldname => [pos, val, type] has the disadvantage that each
field must know its position. If you increase the length of one field,
you have to search for all other fields with higher pos, and increase
their pos.

An array of values, with defined accessors plus #parse and #to_s
methods, is probably better. I think Ara Howard's arrayfields lib might
be a place to start, and then you can implement #parse and #to_s using
#unpack and #pack. You don't need to keep track of pos and update it
each time a field changes size, as long as each field knows its
(current) length. Don't worry about actual offsets except in #to_s. Be lazy.

With this approach, the accessors will be much more efficient than
BitStructs, but parse/to_s will be less efficient.

The trouble is that I'm still getting to grips with the nontrivial parts of
Ruby metaprogramming, so there are a few fiddly details that I'm mentally
glossing over. I _think_, for example, that it would be cool to be able to
define the class fields as using any calculated value at the time of
instantiation. Take UDP for example, where the checksum is performed over a
pseudoheader + payload. That rapidly starts to twist my brain though, since
the UDP object would need to know if it is the payload of an IP object
before being able to calculate the checksum. Gah. Maybe some Proc that is
called when you call o.field.refresh (which gets called the first time
during instantiation)... but then the checksum depends on other calculated
fields like length so it needs to be done last... ok my brain just exploded.
:frowning:

It's probably better to compute the checksum in terms of the string
representation, rather than try to perform the calculation in terms of
individual field values (which may be in the wrong byte order, may have
too much precision, may need to be shifted into position in a bit field,
...).

I hope you find it worthwhile to work on a library like this.

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407