Collections of structured-data objects: what approach?

Folks:

I'm new to Ruby, though not to programming in general. So I'm looking around
for some of the mechanisms I'm used to finding, and one of them is an
apparatus for Collections (as for example used in C++, OP, VB etc).

I'm hoping that someone can point me in the right direction as to where this
functionality can be found, or failing that suggest the most advantageous
starting point to create it based on more primitive classes.

Main features:

1. The contained objects are user-defined types having multiple fields (data
members). This in itself seems no problem for Ruby.

2. Order/Retrieve by index. The collection should stay ordered by insertion
order and support retrieval by integer index. (Sorted collection is a
separate issue, of course.)

3. Insert/Append/Remove/Delete. (List behavior) Allow appending (at end) or
inserting at arbitrary location, and removal or deletion of arbitrary
members.

4. Find by key: (Dictionary behavior etc) Ability to retrieve member objects
by supplying a value to match against one of the object's fields. Often this
is simple a Name field on each object, which functions as a key, but at
other times one might want lookup based on some other field or fields.

So far I've read plenty that seems related: arrays, hashes, Enumerable, and
several related chapters in PickAxe and Ruby for Rails. Although
"collections" are mentioned in many of these sources, I've not yet hit on a
treatment of the ruby way for the Collection basics described above (eg: R
on R's Collection chapter's description of "insert" is simply replacing the
Nth element of an array).

Needless to say, I'd much appreciate comments illuminating this area!

Thanks

Graham

···

---------------------------------------------------
Graham Wideman
Microsoft Visio MVP
---------------------------------------------------
Book/Tools:
Visio 2003 Developer's Survival Pack
Resources for programmable diagramming at:
http://www.diagramantics.com

Graham Wideman wrote:

Folks:

I'm new to Ruby, though not to programming in general. So I'm looking around for some of the mechanisms I'm used to finding, and one of them is an apparatus for Collections (as for example used in C++, OP, VB etc).

I'm hoping that someone can point me in the right direction as to where this functionality can be found, or failing that suggest the most advantageous starting point to create it based on more primitive classes.

You can get away with just using an Array for almost all of this.

Main features:

1. The contained objects are user-defined types having multiple fields (data members). This in itself seems no problem for Ruby.
2. Order/Retrieve by index. The collection should stay ordered by insertion order and support retrieval by integer index. (Sorted collection is a separate issue, of course.)
3. Insert/Append/Remove/Delete. (List behavior) Allow appending (at end) or inserting at arbitrary location, and removal or deletion of arbitrary members.
4. Find by key: (Dictionary behavior etc) Ability to retrieve member objects by supplying a value to match against one of the object's fields. Often this is simple a Name field on each object, which functions as a key, but at other times one might want lookup based on some other field or fields.

None of these are a problem for Array:

a = [{:a => 1, :b => 2}, {:c => 4, :d => 5}]
a[1] # => {:c=>4, :d=>5}
a << {:e => 6, :f => 7}
    # => [{:b=>2, :a=>1}, {:c=>4, :d=>5}, {:e=>6, :f=>7}]
a.select {|x| x[:c] == 4} # => [{:c=>4, :d=>5}]

b = [1,2,3,4]
b.insert(2,5) # => [1, 2, 5, 3, 4]

So far I've read plenty that seems related: arrays, hashes, Enumerable, and several related chapters in PickAxe and Ruby for Rails. Although "collections" are mentioned in many of these sources, I've not yet hit on a treatment of the ruby way for the Collection basics described above (eg: R on R's Collection chapter's description of "insert" is simply replacing the Nth element of an array).

If so, that's simply wrong. Array#insert(n, element) makes the inserted element the n'th and pushes everything after it along one.

···

--
Alex

For the sake of discussion I present a different approach. As long as you don't need 4 (lookup by key) an Array is perfect. However, with the lookup IMHO a new data structure is preferred because then consistency can be handled internally. I'll attach a half grown quick hack to demonstrate what I mean.

Kind regards

  robert

idx-coll.rb (796 Bytes)

···

On 25.08.2006 07:59, Graham Wideman wrote:

Folks:

I'm new to Ruby, though not to programming in general. So I'm looking around for some of the mechanisms I'm used to finding, and one of them is an apparatus for Collections (as for example used in C++, OP, VB etc).

I'm hoping that someone can point me in the right direction as to where this functionality can be found, or failing that suggest the most advantageous starting point to create it based on more primitive classes.

Main features:

1. The contained objects are user-defined types having multiple fields (data members). This in itself seems no problem for Ruby.

2. Order/Retrieve by index. The collection should stay ordered by insertion order and support retrieval by integer index. (Sorted collection is a separate issue, of course.)

3. Insert/Append/Remove/Delete. (List behavior) Allow appending (at end) or inserting at arbitrary location, and removal or deletion of arbitrary members.

4. Find by key: (Dictionary behavior etc) Ability to retrieve member objects by supplying a value to match against one of the object's fields. Often this is simple a Name field on each object, which functions as a key, but at other times one might want lookup based on some other field or fields.

So far I've read plenty that seems related: arrays, hashes, Enumerable, and several related chapters in PickAxe and Ruby for Rails. Although "collections" are mentioned in many of these sources, I've not yet hit on a treatment of the ruby way for the Collection basics described above (eg: R on R's Collection chapter's description of "insert" is simply replacing the Nth element of an array).

Needless to say, I'd much appreciate comments illuminating this area!

Alex:

Ah-ha! Thanks for getting me on the right track.

Couple of points (for others following along...)

You can get away with just using an Array for almost all of this.

a = [{:a => 1, :b => 2}, {:c => 4, :d => 5}]
a[1] # => {:c=>4, :d=>5}
a << {:e => 6, :f => 7}
# => [{:b=>2, :a=>1}, {:c=>4, :d=>5}, {:e=>6, :f=>7}]
a.select {|x| x[:c] == 4} # => [{:c=>4, :d=>5}]

b = [1,2,3,4]
b.insert(2,5) # => [1, 2, 5, 3, 4]

Now THAT's the set of examples I needed, and that should be in the major
Ruby docs.

And I'd add Array#delete

(... though I'm not yet clear on the sematics. Does this simply remove the
object from the array, or also delete it's data? Ie: if there's another
reference to this object, is the data still there? To be tested...)

[...Ruby for Rail's Collection chapter's description of "insert"...]

If so, that's simply wrong.

Yep, so it seems!

PickAxe's "Containers" etc chapter manages to get us all the way to lambda
functions and closures, but its "SongList" example is too simple to be as
useful as your examples above.

Meanwhile, Ruby for Rails' "Collections, containers" etc chapter says the
following (emphasis mine):

···

-----------------------------
11.2.2 Inserting, retrieving and removing array elements
Because an array is an ordered collection, any object you add to the array
goes either at the beginning, at the end, or somewhere in the middle. The
most general technique for INSERTING one or more items into an array is the
setter method = (square brackets...
-----------------------------

... and goes on to give "insert" examples which do not insert, they simply
assign to an array slot. (And there's no mention of the insert method). So
I had written off array as being the complete answer, since its supposed
insert syntax "obviously" didn't really know how to insert.

Now that you've pointed out that there's an insert method called, ahem,
"insert", I see that it actually IS the array class that handles Collection
duty, and I can focus there for some more serious RTFM-ing.

Thanks again for the quick clarification!

Graham

--
---------------------------------------------------
Graham Wideman
Microsoft Visio MVP
---------------------------------------------------
Book/Tools:
Visio 2003 Developer's Survival Pack
Resources for programmable diagramming at:
http://www.diagramantics.com

Robert:

Question and then a comment:

Question: What does the keyword "self" do when sitting on a line by itself,
as in several of your methods?

Comment:

For the sake of discussion I present a different approach. As long as
you don't need 4 (lookup by key) an Array is perfect. However, with the
lookup IMHO a new data structure is preferred because then consistency
can be handled internally. I'll attach a half grown quick hack to
demonstrate what I mean.

If I'm reading you correctly, you are saying a bit more than this:

1. For lookup (or for that matter if one want to implement a
SortedCollection), then it's desirable to have an index for the array on the
key field(s), which can be a ruby hash. (Ie: one *could* use Enumerable#find
or find_all, but on larger collections that's slow?)

2. But if you are going to implement an index using a helper hash, then you
want a way to keep the hash up-to-date as items are inserted or deleted from
the array.

3. So you advocate a class to wrap these together so that "consistency can
be handled internally".

Thanks for your reply,

Graham

···

--
---------------------------------------------------
Graham Wideman
Microsoft Visio MVP
---------------------------------------------------
Book/Tools:
Visio 2003 Developer's Survival Pack
Resources for programmable diagramming at:
http://www.diagramantics.com

Graham Wideman wrote:
<snip>

And I'd add Array#delete

(... though I'm not yet clear on the sematics. Does this simply remove the object from the array, or also delete it's data? Ie: if there's another reference to this object, is the data still there? To be tested...)

a = "foo"
b = [a, "bar", "qux", a]
b.delete(a)
b # => ["bar", "qux"]
a # => "foo"

So no, Array#delete doesn't delete the data itself, only the reference to it. You may also want:

b.delete_at(0)
b # => ["qux"]

<snip>

Now that you've pointed out that there's an insert method called, ahem, "insert", I see that it actually IS the array class that handles Collection duty, and I can focus there for some more serious RTFM-ing.

Array is surprisingly powerful, especially when you take the inject method into account. It's a real workhorse.

Thanks again for the quick clarification!

No worries :slight_smile:

···

--
Alex

Question: What does the keyword "self" do when sitting on a line by itself,
as in several of your methods?

Ruby returns the last value of a method. self on a line by itself returns self.

Graham Wideman wrote:

You can get away with just using an Array for almost all of this.

a = [{:a => 1, :b => 2}, {:c => 4, :d => 5}]
a[1] # => {:c=>4, :d=>5}
a << {:e => 6, :f => 7}
# => [{:b=>2, :a=>1}, {:c=>4, :d=>5}, {:e=>6, :f=>7}]
a.select {|x| x[:c] == 4} # => [{:c=>4, :d=>5}]

b = [1,2,3,4]
b.insert(2,5) # => [1, 2, 5, 3, 4]

Now THAT's the set of examples I needed, and that should be in the major
Ruby docs.

It is in the main Ruby docs. It's spread between the methods in use,
because we couldn't anticipate every single combination of methods you
might need to use and document them all in one place.

···

On 8/25/06, Graham Wideman <checkforrealaddress@diagramantics.com> wrote:

--
- Simen

Graham Wideman wrote:

Robert:

Question and then a comment:

Question: What does the keyword "self" do when sitting on a line by itself, as in several of your methods?

Simen answered that one. As an additional note, traditionally #each returns self. Also, by doing this you can chain methods.

Comment:

For the sake of discussion I present a different approach. As long as
you don't need 4 (lookup by key) an Array is perfect. However, with the
lookup IMHO a new data structure is preferred because then consistency
can be handled internally. I'll attach a half grown quick hack to
demonstrate what I mean.

If I'm reading you correctly, you are saying a bit more than this:

1. For lookup (or for that matter if one want to implement a SortedCollection), then it's desirable to have an index for the array on the key field(s), which can be a ruby hash. (Ie: one *could* use Enumerable#find or find_all, but on larger collections that's slow?)

find is O(n) which can seriously hurt your applications performance if the Array grows beyond a certain limit. Hash lookup on the other hand is O(1).

2. But if you are going to implement an index using a helper hash, then you want a way to keep the hash up-to-date as items are inserted or deleted from the array.

3. So you advocate a class to wrap these together so that "consistency can be handled internally".

Right. You got it. That's what OO is about.

Kind regards

  robert

Hi --

[...Ruby for Rail's Collection chapter's description of "insert"...]

If so, that's simply wrong.

Yep, so it seems!

Actually I don't talk about Array#insert, so my description of it
can't be wrong :slight_smile:

PickAxe's "Containers" etc chapter manages to get us all the way to lambda
functions and closures, but its "SongList" example is too simple to be as
useful as your examples above.

Meanwhile, Ruby for Rails' "Collections, containers" etc chapter says the
following (emphasis mine):

-----------------------------
11.2.2 Inserting, retrieving and removing array elements
Because an array is an ordered collection, any object you add to the array
goes either at the beginning, at the end, or somewhere in the middle. The
most general technique for INSERTING one or more items into an array is the
setter method = (square brackets...
-----------------------------

... and goes on to give "insert" examples which do not insert, they simply
assign to an array slot. (And there's no mention of the insert method). So
I had written off array as being the complete answer, since its supposed
insert syntax "obviously" didn't really know how to insert.

Yeah, I tend to use the term "insert" sort of generically; and the
insert method is among the many methods that didn't make the "for
Rails" cut. (Keep in mind that R4R isn't a complete Ruby core method
reference.)

But I can see that using a method name as an informal way to describe
what another method does might be awkward.

David

···

On Fri, 25 Aug 2006, Graham Wideman wrote:

--
                   David A. Black | dblack@wobblini.net
Author of "Ruby for Rails" [1] | Ruby/Rails training & consultancy [3]
DABlog (DAB's Weblog) [2] | Co-director, Ruby Central, Inc. [4]
[1] Ruby for Rails | [3] http://www.rubypowerandlight.com
[2] http://dablog.rubypal.com | [4] http://www.rubycentral.org

Alex:

So no, Array#delete doesn't delete the data itself, only the reference to
it. You may also want:

b.delete_at(0)
b # => ["qux"]

[...]

Array is surprisingly powerful, especially when you take the inject method
into account. It's a real workhorse.

Thanks,

Graham

Simen:

Thanks for your reply....

Now THAT's the set of examples I needed, and that should be in the major
Ruby docs.

It is in the main Ruby docs. It's spread between the methods in use,
because we couldn't anticipate every single combination of methods you
might need to use and document them all in one place.

I was more commenting on the major ruby docs like pickaxe, Ruby for Rails,
FAQs and so on. While some mention collections, this mostly just introduces
basic features of hashes and arrays. None of the discussion that I've run
across so far actually puts it all together into a comprehensive discussion
of how the collections functionality familiar from other languages
(Java, C#, C++, OP, VB etc) can be acquired from array and hash plus
whatever.

Yes, I agree that all the features and methods are documented in the
language guide (eg: in pickaxe), but one needs to know which set of those
features are to be assembled to create the familiar collection
functionality, in the absence of a big flashing sign on an actual Collection
class. (Especially bearing in mind that other languages/libraries go to some
pains to distinguish the concepts of Array and Collection -- not to say that
any particular other language is the most righteous).

Anyhow, for my part I'm a great deal further along than I was a couple of
days ago, thanks in part to helpful responses here and elsewhere.

Thanks all!

Graham

Robert:

Thanks, and...

Right. You got it. That's what OO is about.

Well, I already had the OO part, just needed the ruby version of it :slight_smile: plus
connecting the dots might help others stumbling along this way, and you
never know it might turn out that your "consistency" comment was referring
to something else I didn't see.

Anyhow, this discussion has been most helpful to alert me to all the
capabilities that Array has, and calibrating my sense that a little bit of
assembly required to implement collections with various features, they're
not just in some library that "of course" everyone knows about and uses :slight_smile:

Graham

David:

Actually I don't talk about Array#insert, so my description of it
can't be wrong :slight_smile:

... and for balance, I should say that I *am* enjoying other parts of the
book :slight_smile:

Graham

Graham Wideman wrote:

Robert:

Thanks, and...

You're welcome!

Right. You got it. That's what OO is about.

Well, I already had the OO part, just needed the ruby version of it :slight_smile: plus connecting the dots might help others stumbling along this way, and you never know it might turn out that your "consistency" comment was referring to something else I didn't see.

I probably should have put "encapsulation" somewhere in that posting. :slight_smile: Anyway, I'm glad I could help sort these things out.

Anyhow, this discussion has been most helpful to alert me to all the capabilities that Array has, and calibrating my sense that a little bit of assembly required to implement collections with various features, they're not just in some library that "of course" everyone knows about and uses :slight_smile:

Yeah, the reason is probably that Array and Hash are so powerful on the one hand and that requirements are so different on the other hand. Java follows a tad different approach by providing collections for usual situations.

Kind regards

  robert