Collections with values of fixed classes/lengths

> I still think a more general extension would be nice. And
> Robert, I guess you might say I'm guessing. I'm new to
ruby,
> but from my perl usage, I know of many times I've dealt
with
> large amounts of data and would have wanted more C-like
> efficiency.

:wink:

As Mark has demonstrated, you can stuff anything into a
String with pack and
unpack.

Btw, Mark, if you factor out conversion to string and from
string (hint
traits), you have a generic implementation for any fixed size
type. Maybe
this should go somewhere into the std lib...

That would be great. What do you mean by "fixed size type"?
Does that mean you couldn't use this class to make a
homogeneous array of strings, arrays, or hashes? The way I was
thinking you'd handle the complex object case would be to give
"new" an initial object instead of a simply a class. From
this, you could tell how much flattening to do in the array.

Now that I'm looking at Array.new, you could even use the same
interface:

HomogeneousArray.new(size=0,defaultObject=nil)

Now, defaultObject is not only the default but its class is
used to make the array homogeneous (if not nil)

Here are a few examples:

# like Array.new - objects can be anything (treat nil as Object
instead of NilClass) and initialized to nil (when array
expands)
HomogeneousArray.new(0,nil)

# array of Floats initialized to 0.0
HomogeneousArray.new(0,0.0)

# array of FixNums initialized to 0
HomogeneousArray.new(0,0)

# array of Strings initialized to ""
HomogeneousArray.new(0,"")

# array of name/address/zip structs where fields can be
anything and are initialized to nil
Customer = Struct.new(:name,:address,:zip)
HomogeneousArray.new(0,Customer.new)

# now name/address/zip must be String/String/Fixnum and are
initialized to empty strings and 0. There is one more level of
flattening in the array storage compared to above
Customer = Struct.new(:name,:address,:zip)
HomogeneousArray.new(0,Customer.new("","",0))

I think having something similar for a Hash would also be
useful. The "new" method with this homogenous hash class just
needs one more optional argument:

HomogeneousHash.new(defaultValue=nil, defaultKey=nil)

Here would be a few examples:

HomogeneousHash.new(nil,nil) # same as Hash.new
HomogeneousHash.new(nil,"") # keys must be Strings

# keys are Strings and values are String,Fixnum Structs
CustomerData = Struct.new(:address,:zip)
HomogeneousHash.new(CustomerData.new("",0),"")

# values must be true. No value storage should be necessary
since only one value is possible. Only key storage should be
needed. This acts like an unordered set.
HomogeneousHash.new(true,nil)

In addition to all this, some way to designate fixed-length
Arrays, Strings, and Bignums (down to 1-bit) in the collection
values would allow one more level of flattening in the storage.
You could have special classes or just designate a non-zero
length to mean the objects should have a fixed length. For
Bignums/Fixnums, you'd just strip off the most significant 1 to
allow easy specifying of bits and allow any default. Here
would be some more examples using this method for designating
fixed-length Arrays, String, and Bignums:

# array of 64-bit integers with initial value of 0
HomogeneousArray.new(0,2**64)

# array of 16-bit integers with initial value of 1
HomogeneousArray.new(0,2**16+1)

# array of 1-bit integers with initial value of 0
HomogeneousArray.new(0,2**1)

# array of array of 4 8-bit integers initialized to 0
HomogeneousArray.new(0,[2**8]*4)

# array of 2 character string initialized to \0\0
HomogeneousArray.new(0,"\0\0")

# array of array of 4 Objects initialized to nil
HomogeneousArray.new(0,[nil]*4)

Well, I think this summarizes my proposal. Maybe an RCR is in
order.

路路路

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around

> > I still think a more general extension would be nice. And
> > Robert, I guess you might say I'm guessing. I'm new to
> ruby,
> > but from my perl usage, I know of many times I've dealt
> with
> > large amounts of data and would have wanted more C-like
> > efficiency.
>
> :wink:
>
> As Mark has demonstrated, you can stuff anything into a
> String with pack and
> unpack.
>
> Btw, Mark, if you factor out conversion to string and from
> string (hint
> traits), you have a generic implementation for any fixed size
> type. Maybe
> this should go somewhere into the std lib...

That would be great. What do you mean by "fixed size type"?
Does that mean you couldn't use this class to make a
homogeneous array of strings, arrays, or hashes? The way I was
thinking you'd handle the complex object case would be to give
"new" an initial object instead of a simply a class. From
this, you could tell how much flattening to do in the array.

I can see some benefit in an array (or even hash) of fixed-size data
items; sometimes memory usage considerations are more important than
flexibility. The fact that the homogeneity is enforced I saw as a
somewhat unfortunate side-effect, though maybe I'm just too used to
Ruby's very dynamic nature. Why would you want to artificially force
the contents of an array to be all the same type? Why preclude any
possibility of duck typing? I just have the feeling that anyone using
this would have it bite them later on, when they want to use the full
flexibility of Ruby, and can't.

路路路

On 5/2/05, Eric Mahurin <eric_mahurin@yahoo.com> wrote:

Now that I'm looking at Array.new, you could even use the same
interface:

HomogeneousArray.new(size=0,defaultObject=nil)

Now, defaultObject is not only the default but its class is
used to make the array homogeneous (if not nil)

Here are a few examples:

# like Array.new - objects can be anything (treat nil as Object
instead of NilClass) and initialized to nil (when array
expands)
HomogeneousArray.new(0,nil)

# array of Floats initialized to 0.0
HomogeneousArray.new(0,0.0)

# array of FixNums initialized to 0
HomogeneousArray.new(0,0)

# array of Strings initialized to ""
HomogeneousArray.new(0,"")

# array of name/address/zip structs where fields can be
anything and are initialized to nil
Customer = Struct.new(:name,:address,:zip)
HomogeneousArray.new(0,Customer.new)

# now name/address/zip must be String/String/Fixnum and are
initialized to empty strings and 0. There is one more level of
flattening in the array storage compared to above
Customer = Struct.new(:name,:address,:zip)
HomogeneousArray.new(0,Customer.new("","",0))

I think having something similar for a Hash would also be
useful. The "new" method with this homogenous hash class just
needs one more optional argument:

HomogeneousHash.new(defaultValue=nil, defaultKey=nil)

Here would be a few examples:

HomogeneousHash.new(nil,nil) # same as Hash.new
HomogeneousHash.new(nil,"") # keys must be Strings

# keys are Strings and values are String,Fixnum Structs
CustomerData = Struct.new(:address,:zip)
HomogeneousHash.new(CustomerData.new("",0),"")

# values must be true. No value storage should be necessary
since only one value is possible. Only key storage should be
needed. This acts like an unordered set.
HomogeneousHash.new(true,nil)

In addition to all this, some way to designate fixed-length
Arrays, Strings, and Bignums (down to 1-bit) in the collection
values would allow one more level of flattening in the storage.
You could have special classes or just designate a non-zero
length to mean the objects should have a fixed length. For
Bignums/Fixnums, you'd just strip off the most significant 1 to
allow easy specifying of bits and allow any default. Here
would be some more examples using this method for designating
fixed-length Arrays, String, and Bignums:

# array of 64-bit integers with initial value of 0
HomogeneousArray.new(0,2**64)

# array of 16-bit integers with initial value of 1
HomogeneousArray.new(0,2**16+1)

# array of 1-bit integers with initial value of 0
HomogeneousArray.new(0,2**1)

# array of array of 4 8-bit integers initialized to 0
HomogeneousArray.new(0,[2**8]*4)

# array of 2 character string initialized to \0\0
HomogeneousArray.new(0,"\0\0")

# array of array of 4 Objects initialized to nil
HomogeneousArray.new(0,[nil]*4)

Well, I think this summarizes my proposal. Maybe an RCR is in
order.

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

"Eric Mahurin" <eric_mahurin@yahoo.com> schrieb im Newsbeitrag
news:20050502150819.69075.qmail@web41102.mail.yahoo.com...

> > I still think a more general extension would be nice. And
> > Robert, I guess you might say I'm guessing. I'm new to
> ruby,
> > but from my perl usage, I know of many times I've dealt
> with
> > large amounts of data and would have wanted more C-like
> > efficiency.
>
> :wink:
>
> As Mark has demonstrated, you can stuff anything into a
> String with pack and
> unpack.
>
> Btw, Mark, if you factor out conversion to string and from
> string (hint
> traits), you have a generic implementation for any fixed size
> type. Maybe
> this should go somewhere into the std lib...

That would be great. What do you mean by "fixed size type"?
Does that mean you couldn't use this class to make a
homogeneous array of strings, arrays, or hashes?

Exactly. Because for that to work you need info about each instance
stored there. And then you have Array.

The way I was
thinking you'd handle the complex object case would be to give
"new" an initial object instead of a simply a class. From
this, you could tell how much flattening to do in the array.

Now that I'm looking at Array.new, you could even use the same
interface:

HomogeneousArray.new(size=0,defaultObject=nil)

Now, defaultObject is not only the default but its class is
used to make the array homogeneous (if not nil)

Here are a few examples:

# like Array.new - objects can be anything (treat nil as Object
instead of NilClass) and initialized to nil (when array
expands)
HomogeneousArray.new(0,nil)

This doesn't make sense as that would a) not work in the general case and
b) if it would, it would create unnecessary overhead.

# array of Floats initialized to 0.0
HomogeneousArray.new(0,0.0)

# array of FixNums initialized to 0
HomogeneousArray.new(0,0)

# array of Strings initialized to ""
HomogeneousArray.new(0,"")

# array of name/address/zip structs where fields can be
anything and are initialized to nil
Customer = Struct.new(:name,:address,:zip)
HomogeneousArray.new(0,Customer.new)

# now name/address/zip must be String/String/Fixnum and are
initialized to empty strings and 0. There is one more level of
flattening in the array storage compared to above
Customer = Struct.new(:name,:address,:zip)
HomogeneousArray.new(0,Customer.new("","",0))

I think having something similar for a Hash would also be
useful. The "new" method with this homogenous hash class just
needs one more optional argument:

HomogeneousHash.new(defaultValue=nil, defaultKey=nil)

Here would be a few examples:

HomogeneousHash.new(nil,nil) # same as Hash.new
HomogeneousHash.new(nil,"") # keys must be Strings

# keys are Strings and values are String,Fixnum Structs
CustomerData = Struct.new(:address,:zip)
HomogeneousHash.new(CustomerData.new("",0),"")

# values must be true. No value storage should be necessary
since only one value is possible. Only key storage should be
needed. This acts like an unordered set.
HomogeneousHash.new(true,nil)

In addition to all this, some way to designate fixed-length
Arrays, Strings, and Bignums (down to 1-bit) in the collection
values would allow one more level of flattening in the storage.
You could have special classes or just designate a non-zero
length to mean the objects should have a fixed length. For
Bignums/Fixnums, you'd just strip off the most significant 1 to
allow easy specifying of bits and allow any default. Here
would be some more examples using this method for designating
fixed-length Arrays, String, and Bignums:

# array of 64-bit integers with initial value of 0
HomogeneousArray.new(0,2**64)

# array of 16-bit integers with initial value of 1
HomogeneousArray.new(0,2**16+1)

# array of 1-bit integers with initial value of 0
HomogeneousArray.new(0,2**1)

# array of array of 4 8-bit integers initialized to 0
HomogeneousArray.new(0,[2**8]*4)

# array of 2 character string initialized to \0\0
HomogeneousArray.new(0,"\0\0")

# array of array of 4 Objects initialized to nil
HomogeneousArray.new(0,[nil]*4)

Well, I think this summarizes my proposal. Maybe an RCR is in
order.

Personally I don't like the sample instance stuff. This puts too much
knowlege into HomogeneousArray while restricting extensibility at the same
time. I prefer the traits approach:

# disclaimer: this is just a quick demo
class HomogeneousArray
  class FloatTraits
    def size() 4 end
    def to_native(str) str.unpack("f*") end
    def to_string(*values) values.pack("f*") end
  end

  FLOAT = FloatTraits.new

  class IntTraits
    def size() 4 end
    def to_native(str) str.unpack("i*") end
    def to_string(*values) values.pack("i*") end
  end

  INT = IntTraits.new

  include Enumerable

  def initialize(traits)
    @traits = traits
    @storage = ""
  end

  def <<(o) @storage << @traits.to_string(o); self end
  def size() @storage.length / @traits.size end
  def empty?() @storage.empty? end

  def each
    size.times {|i| yield self[i]}
    self
  end

  def (idx,len=nil)
    if len
      @traits.to_native( @storage[idx * @traits.size, len *
@traits.size] )
    elsif Range === idx
      @traits.to_native( @storage[idx.first * @traits.size, (idx.last -
idx.first) * @traits.size] )
    else
      @traits.to_native( @storage[idx * @traits.size, @traits.size] )[0]
    end
  end

  def to_s() @traits.to_native( @storage ).join end
  def inspect() @traits.to_native( @storage ).inspect end
end

hf = HomogeneousArray.new HomogeneousArray::FLOAT
hf << 1.2 << 3.4 << 1.2
hi = HomogeneousArray.new HomogeneousArray::INT
hi << 1 << 2 << 3

Kind regards

    robert

> > > I still think a more general extension would be nice.
And
> > > Robert, I guess you might say I'm guessing. I'm new to
> > ruby,
> > > but from my perl usage, I know of many times I've dealt
> > with
> > > large amounts of data and would have wanted more C-like
> > > efficiency.
> >
> > :wink:
> >
> > As Mark has demonstrated, you can stuff anything into a
> > String with pack and
> > unpack.
> >
> > Btw, Mark, if you factor out conversion to string and
from
> > string (hint
> > traits), you have a generic implementation for any fixed
size
> > type. Maybe
> > this should go somewhere into the std lib...
>
> That would be great. What do you mean by "fixed size
type"?
> Does that mean you couldn't use this class to make a
> homogeneous array of strings, arrays, or hashes? The way I
was
> thinking you'd handle the complex object case would be to
give
> "new" an initial object instead of a simply a class. From
> this, you could tell how much flattening to do in the
array.

I can see some benefit in an array (or even hash) of
fixed-size data
items; sometimes memory usage considerations are more
important than
flexibility. The fact that the homogeneity is enforced I saw
as a
somewhat unfortunate side-effect, though maybe I'm just too
used to
Ruby's very dynamic nature. Why would you want to
artificially force
the contents of an array to be all the same type? Why
preclude any
possibility of duck typing? I just have the feeling that
anyone using
this would have it bite them later on, when they want to use
the full
flexibility of Ruby, and can't.

I think you answered your own question - "sometimes memory
usage considerations are more important than flexibility". To
get this memory efficiency you would only be able to store
objects of one class because you would be storing only the data
per element - no class information. You wouldn't even be able
to put nil objects in for a homogeneous array - you may have to
pick some value to mean nil if you need that.

Where you can take advantage of this, I was hoping you could
just replace "Array" with "HomogeneousArray". You could always
back out to Array if this becomes to inflexible.

路路路

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around

> > > I still think a more general extension would be nice.
And
> > > Robert, I guess you might say I'm guessing. I'm new to
> > ruby,
> > > but from my perl usage, I know of many times I've dealt
> > with
> > > large amounts of data and would have wanted more C-like
> > > efficiency.
> >
> > :wink:
> >
> > As Mark has demonstrated, you can stuff anything into a
> > String with pack and
> > unpack.
> >
> > Btw, Mark, if you factor out conversion to string and
from
> > string (hint
> > traits), you have a generic implementation for any fixed
size
> > type. Maybe
> > this should go somewhere into the std lib...
>
> That would be great. What do you mean by "fixed size
type"?
> Does that mean you couldn't use this class to make a
> homogeneous array of strings, arrays, or hashes?

Exactly. Because for that to work you need info about each
instance
stored there. And then you have Array.

But, you still wouldn't have to store the class info for each
element. I'm not sure of the underlying Ruby object data
structure, but at a minimum, it should get rid of a pointer (to
the class) per element. I think you'd have to implement this
in C to do it right.

> The way I was
> thinking you'd handle the complex object case would be to
give
> "new" an initial object instead of a simply a class. From
> this, you could tell how much flattening to do in the
array.
>
> Now that I'm looking at Array.new, you could even use the
same
> interface:
>
> HomogeneousArray.new(size=0,defaultObject=nil)
>
> Now, defaultObject is not only the default but its class is
> used to make the array homogeneous (if not nil)
>
> Here are a few examples:
>
> # like Array.new - objects can be anything (treat nil as
Object
> instead of NilClass) and initialized to nil (when array
> expands)
> HomogeneousArray.new(0,nil)

This doesn't make sense as that would a) not work in the
general case and
b) if it would, it would create unnecessary overhead.

> # array of Floats initialized to 0.0
> HomogeneousArray.new(0,0.0)
>
> # array of FixNums initialized to 0
> HomogeneousArray.new(0,0)
>
> # array of Strings initialized to ""
> HomogeneousArray.new(0,"")
>
> # array of name/address/zip structs where fields can be
> anything and are initialized to nil
> Customer = Struct.new(:name,:address,:zip)
> HomogeneousArray.new(0,Customer.new)
>
> # now name/address/zip must be String/String/Fixnum and are
> initialized to empty strings and 0. There is one more
level of
> flattening in the array storage compared to above
> Customer = Struct.new(:name,:address,:zip)
> HomogeneousArray.new(0,Customer.new("","",0))
>
> I think having something similar for a Hash would also be
> useful. The "new" method with this homogenous hash class
just
> needs one more optional argument:
>
> HomogeneousHash.new(defaultValue=nil, defaultKey=nil)
>
> Here would be a few examples:
>
> HomogeneousHash.new(nil,nil) # same as Hash.new
> HomogeneousHash.new(nil,"") # keys must be Strings
>
> # keys are Strings and values are String,Fixnum Structs
> CustomerData = Struct.new(:address,:zip)
> HomogeneousHash.new(CustomerData.new("",0),"")
>
> # values must be true. No value storage should be
necessary
> since only one value is possible. Only key storage should
be
> needed. This acts like an unordered set.
> HomogeneousHash.new(true,nil)
>
>
> In addition to all this, some way to designate fixed-length
> Arrays, Strings, and Bignums (down to 1-bit) in the
collection
> values would allow one more level of flattening in the
storage.
> You could have special classes or just designate a
non-zero
> length to mean the objects should have a fixed length. For
> Bignums/Fixnums, you'd just strip off the most significant
1 to
> allow easy specifying of bits and allow any default. Here
> would be some more examples using this method for
designating
> fixed-length Arrays, String, and Bignums:
>
> # array of 64-bit integers with initial value of 0
> HomogeneousArray.new(0,2**64)
>
> # array of 16-bit integers with initial value of 1
> HomogeneousArray.new(0,2**16+1)
>
> # array of 1-bit integers with initial value of 0
> HomogeneousArray.new(0,2**1)
>
> # array of array of 4 8-bit integers initialized to 0
> HomogeneousArray.new(0,[2**8]*4)
>
> # array of 2 character string initialized to \0\0
> HomogeneousArray.new(0,"\0\0")
>
> # array of array of 4 Objects initialized to nil
> HomogeneousArray.new(0,[nil]*4)
>
>
> Well, I think this summarizes my proposal. Maybe an RCR is
in
> order.

Personally I don't like the sample instance stuff. This puts
too much
knowlege into HomogeneousArray while restricting
extensibility at the same
time. I prefer the traits approach:

# disclaimer: this is just a quick demo
class HomogeneousArray
  class FloatTraits
    def size() 4 end
    def to_native(str) str.unpack("f*") end
    def to_string(*values) values.pack("f*") end
  end

  FLOAT = FloatTraits.new

  class IntTraits
    def size() 4 end
    def to_native(str) str.unpack("i*") end
    def to_string(*values) values.pack("i*") end
  end

  INT = IntTraits.new

  include Enumerable

  def initialize(traits)
    @traits = traits
    @storage = ""
  end

  def <<(o) @storage << @traits.to_string(o); self end
  def size() @storage.length / @traits.size end
  def empty?() @storage.empty? end

  def each
    size.times {|i| yield self[i]}
    self
  end

  def (idx,len=nil)
    if len
      @traits.to_native( @storage[idx * @traits.size, len *
@traits.size] )
    elsif Range === idx
      @traits.to_native( @storage[idx.first * @traits.size,
(idx.last -
idx.first) * @traits.size] )
    else
      @traits.to_native( @storage[idx * @traits.size,
@traits.size] )[0]
    end
  end

Thanks for the implementation. Another option for implementing
this in Ruby may be to use Marshal dump/load and remove/prepend
the class information. You would at least be able to remove
the class of the top-level object.

I still like having to specify a default object (instead of
just a class) for specifying what types of objects are in the
collection because:

* when the collection expands with empty elements you need to
put something there. nil won't work because it is a different
class (NilClass). You need a default object to put in
otherwise what comes out is undefined (may give garbage or
exceptions).

* you can handle more general classes because you have access
to the "instance_variables" (or "members" for Struct), etc.
You can also contemplate using Marshal - for completely
fixed-length objects.

* you can handle deep objects by using the default object as a
template for how deep the homogeneity runs.

* this default/template object can also provide a mechanism for
specifying fixed-length strings/array/integers.

路路路

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around

"Eric Mahurin" <eric_mahurin@yahoo.com> schrieb im Newsbeitrag news:20050502203538.19806.qmail@web41106.mail.yahoo.com...

> That would be great. What do you mean by "fixed size
type"?
> Does that mean you couldn't use this class to make a
> homogeneous array of strings, arrays, or hashes?

Exactly. Because for that to work you need info about each
instance
stored there. And then you have Array.

But, you still wouldn't have to store the class info for each
element. I'm not sure of the underlying Ruby object data
structure, but at a minimum, it should get rid of a pointer (to
the class) per element. I think you'd have to implement this
in C to do it right.

I don't think it's worth the effort. That must be a very special case where you need huge collections of inhomogenous data where you want to squeeze out every byte. I can't think of an application of this. Plus, it'll be error prone and very inefficient.

<snip/>

Thanks for the implementation. Another option for implementing
this in Ruby may be to use Marshal dump/load and remove/prepend
the class information. You would at least be able to remove
the class of the top-level object.

I still like having to specify a default object (instead of
just a class) for specifying what types of objects are in the
collection because:

* when the collection expands with empty elements you need to
put something there. nil won't work because it is a different
class (NilClass). You need a default object to put in
otherwise what comes out is undefined (may give garbage or
exceptions).

Just add a method to the traits that returns this object (or can create a new object).

* you can handle more general classes because you have access
to the "instance_variables" (or "members" for Struct), etc.
You can also contemplate using Marshal - for completely
fixed-length objects.

What would you gain by this? Every instance of a single class can have a complete different set of instance variables.

* you can handle deep objects by using the default object as a
template for how deep the homogeneity runs.

This sounds too theoretical for me.

* this default/template object can also provide a mechanism for
specifying fixed-length strings/array/integers.

I'm sorry, I don't get you here. What do you mean by "specifying"?

Regards

    robert

I think you answered your own question - "sometimes memory
usage considerations are more important than flexibility". To
get this memory efficiency you would only be able to store
objects of one class because you would be storing only the data
per element - no class information. You wouldn't even be able
to put nil objects in for a homogeneous array - you may have to
pick some value to mean nil if you need that.

It's not just the memory usage. When data is more tightly packed in RAM, one benefits from cache locality (that is, your data is more often in high-speed RAM than in low-speed RAM), and furthermore, if the data is in a CPU-native format, then you can do some pretty fast processing on it, especially if using SIMD extensions such as MMX/SSE or Altivec.

This is important for me, as I *am* doing live realtime video processing.

(For that, I combine PureData, Ruby, C++, and then some asm).

  ,-o---------o---------o---------o-. ,---. irc.freenode.net #dataflow |
  > The Diagram is the Program (TM) | | ,-o-------------o--------------o-.
  `-o--------------o--------------o-' | | Mathieu Bouchard (Montr茅al QC) |

路路路

On Tue, 3 May 2005, Eric Mahurin wrote:
    > t茅l茅phone: +1.514.383.3801 `---' `-o-- http://artengine.ca/matju -'