YAML or CSV?

(Guest) #1

Hi,
I have spreadsheet-like data, a table of data really, that I need to be
able to access on a row-by-row, column-by-column, even cell-by-cell
basis. I need to look at incoming filenames in a directory and match
those filenames with the leftmost column of this table. Then, based on
that entry's row, I need other information from cells in that same row,
putting them in variables, etc. In some cases, I'll need to do some
addition with those values.

Anyway, my, rather primitive, understanding of YAML is that it's a good
textual database like thing that can be accessed via RUBY. Obviously, my
"table" is multiple columns and multiple rows. Is YAML OK with multiple
columns, or, is it meant for just 2 column things, like a hash?

I've played a bit with CSV and James Gray's FasterCSV. It seems more
like the spreadsheet-like paradigm that I need. I would just like some
suggestions.

Thank you,
Peter

···

--
Posted via http://www.ruby-forum.com/.

(Jan Svitok) #2

Hi,

I have a few thoughts:

- YAML is able to express (almost?) any kind of data structure that
you can have in memory (think of it as a kind of
serialisation/marshalling format)

- YAML is more verbose than CSV if you have more dimensional data (in
case of 2D table, CSV will have one line pre row, YAML will have
several indented lines per row, one per cell)

- How do you store your data in memory? If you have the structure that
fits your usage best, you can store it directly to a file as YAML.

- If you need to lookup the data by file name (and you don't need to
keep the order) it seems to me natural to store the data in a Hash
keyed by filename, with the rest columns as the key's value.

- That reminds me, YAML (as expected) maintains order of Array items,
and shuffles Hash items.

- If you need to hand-edit the data, I would suggest CSV. YAML is much
better that XML at storing sturctured data, still CVS is easier to
edit, especially if there are many rows and many columns.

J.

···

On 8/23/06, Peter Bailey <pbailey@bna.com> wrote:

Hi,
I have spreadsheet-like data, a table of data really, that I need to be
able to access on a row-by-row, column-by-column, even cell-by-cell
basis. I need to look at incoming filenames in a directory and match
those filenames with the leftmost column of this table. Then, based on
that entry's row, I need other information from cells in that same row,
putting them in variables, etc. In some cases, I'll need to do some
addition with those values.
Anyway, my, rather primitive, understanding of YAML is that it's a good
textual database like thing that can be accessed via RUBY. Obviously, my
"table" is multiple columns and multiple rows. Is YAML OK with multiple
columns, or, is it meant for just 2 column things, like a hash?

I've played a bit with CSV and James Gray's FasterCSV. It seems more
like the spreadsheet-like paradigm that I need. I would just like some
suggestions.

Thank you,
Peter

(Guest) #3

Jan Svitok wrote:

"table" is multiple columns and multiple rows. Is YAML OK with multiple
columns, or, is it meant for just 2 column things, like a hash?

I've played a bit with CSV and James Gray's FasterCSV. It seems more
like the spreadsheet-like paradigm that I need. I would just like some
suggestions.

Thank you,
Peter

Hi,

I have a few thoughts:

- YAML is able to express (almost?) any kind of data structure that
you can have in memory (think of it as a kind of
serialisation/marshalling format)

- YAML is more verbose than CSV if you have more dimensional data (in
case of 2D table, CSV will have one line pre row, YAML will have
several indented lines per row, one per cell)

- How do you store your data in memory? If you have the structure that
fits your usage best, you can store it directly to a file as YAML.

- If you need to lookup the data by file name (and you don't need to
keep the order) it seems to me natural to store the data in a Hash
keyed by filename, with the rest columns as the key's value.

- That reminds me, YAML (as expected) maintains order of Array items,
and shuffles Hash items.

- If you need to hand-edit the data, I would suggest CSV. YAML is much
better that XML at storing sturctured data, still CVS is easier to
edit, especially if there are many rows and many columns.

J.

Thanks, Jan. My data originally came from a mainframe ASCII export, so,
it looks like a 2D table, delineated with spacebands. I've tweaked it
now so it's just using tabs, and I've even imported it into a
spreadsheet. I guess it's much more like a CSV format now than a YAML
format.

I have one question to your response. You say that I could store the
data in a hash keyed by a filename, which I have done in the past, but,
then you say that the rest of the columns, of which there are many,
could be the key's value. How can you have multiple entries for a key
value? A hash is only a "2-column" entity, isn't it, one key, one value?
Do I just make all the cells in a row one value, with a comma or
something between them as a way to distinguish each cell? In other
words, use the has to match the incoming filename, then, make the row
that it's in an array and take it from there? (I'm thinking out loud
here.)

Thanks again,
Peter

···

On 8/23/06, Peter Bailey <pbailey@bna.com> wrote:

--
Posted via http://www.ruby-forum.com/.

(CRIBBSJ) #4

Peter Bailey wrote:

Thanks, Jan. My data originally came from a mainframe ASCII export, so, it looks like a 2D table, delineated with spacebands. I've tweaked it now so it's just using tabs, and I've even imported it into a spreadsheet. I guess it's much more like a CSV format now than a YAML format.

I have one question to your response. You say that I could store the data in a hash keyed by a filename, which I have done in the past, but, then you say that the rest of the columns, of which there are many, could be the key's value. How can you have multiple entries for a key value? A hash is only a "2-column" entity, isn't it, one key, one value? Do I just make all the cells in a row one value, with a comma or something between them as a way to distinguish each cell? In other words, use the has to match the incoming filename, then, make the row that it's in an array and take it from there? (I'm thinking out loud here.)
  

If you don't mind installing an additional library, take a look at KirbyBase (http://netpromi.com/kirbybase_ruby.html). If you have imported your file into a spreadsheet, then you can do a csv export from the spreadsheet to get a csv file. KirbyBase will allow you to import this csv file and, bingo, you now have a KirbyBase table. You can easily do searches against the table to find records with filenames matching your incoming file. Additionally, the record is returned to you as a Struct object, so you can easily access the other fields.

HTH,

Jamey Cribbs

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and/or privileged information. If you are not the intended recipient(s), you are hereby notified that any dissemination, unauthorized review, use, disclosure or distribution of this email and any materials contained in any attachments is prohibited. If you receive this message in error, or are not the intended recipient(s), please immediately notify the sender by email and destroy all copies of the original message, including attachments.

(Jan Svitok) #5

Thanks, Jan. My data originally came from a mainframe ASCII export, so,
it looks like a 2D table, delineated with spacebands. I've tweaked it
now so it's just using tabs, and I've even imported it into a
spreadsheet. I guess it's much more like a CSV format now than a YAML
format.

I have one question to your response. You say that I could store the
data in a hash keyed by a filename, which I have done in the past, but,
then you say that the rest of the columns, of which there are many,
could be the key's value. How can you have multiple entries for a key
value? A hash is only a "2-column" entity, isn't it, one key, one value?

right.

Do I just make all the cells in a row one value, with a comma or
something between them as a way to distinguish each cell? In other
words, use the has to match the incoming filename, then, make the row
that it's in an array and take it from there? (I'm thinking out loud
here.)

exactly. I ws thinking of transforming

filename1, data11, data12, data13
filename2, data21, data22, data23
filename3, data31, data32, data33

into
{
      filename1 => [ data11, data12, data13 ],
      filename2 => [ data21, data22, data23 ],
      filename3 => [ data31, data32, data33 ]
}
(quotes omitted)

which in YAML looks like:

filename1:
  - data11
  - data12
  - data13
filename2:
  - data21

otherwise your yaml would look like (array of arrays:)

···

On 8/24/06, Peter Bailey <pbailey@bna.com> wrote:
-
  - filename1
  - data11
  - data12
  - data13
-
  - filename2
  - data21

etc...

The problem here is when you need to lookup by some dataXY... that's slower...
NB: Those filenames have to be unique obviously.

(Guest) #6

James Cribbs wrote:

Peter Bailey wrote:

value? A hash is only a "2-column" entity, isn't it, one key, one value?
Do I just make all the cells in a row one value, with a comma or
something between them as a way to distinguish each cell? In other
words, use the has to match the incoming filename, then, make the row
that it's in an array and take it from there? (I'm thinking out loud
here.)
  

If you don't mind installing an additional library, take a look at
KirbyBase (http://netpromi.com/kirbybase_ruby.html). If you have
imported your file into a spreadsheet, then you can do a csv export from
the spreadsheet to get a csv file. KirbyBase will allow you to import
this csv file and, bingo, you now have a KirbyBase table. You can
easily do searches against the table to find records with filenames
matching your incoming file. Additionally, the record is returned to
you as a Struct object, so you can easily access the other fields.

HTH,

Jamey Cribbs

Confidentiality Notice: This email message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
confidential and/or privileged information. If you are not the intended
recipient(s), you are hereby notified that any dissemination,
unauthorized review, use, disclosure or distribution of this email and
any materials contained in any attachments is prohibited. If you receive
this message in error, or are not the intended recipient(s), please
immediately notify the sender by email and destroy all copies of the
original message, including attachments.

Thanks, Jamey. Yes, I've seen the KirbyBase web site before, just in
perusing all things Ruby. I remember his dog there on the home page.
That was months ago though. But, yes, now I can probably use something
just like this. I'm going to have to read up on "Struct" objects, but, I
assume it just means that it's in a useable form for my needs. Thanks
again.

···

--
Posted via http://www.ruby-forum.com/.

(Guest) #7

Cool. That's perfect. Thanks !

···

--
Posted via http://www.ruby-forum.com/.

(Nicolas Desprès) #8

[...]

filename1, data11, data12, data13
filename2, data21, data22, data23
filename3, data31, data32, data33

into
{
      filename1 => [ data11, data12, data13 ],
      filename2 => [ data21, data22, data23 ],
      filename3 => [ data31, data32, data33 ]
}
(quotes omitted)

which in YAML looks like:

filename1:
  - data11
  - data12
  - data13
filename2:
  - data21

You can also use the inline array which is equivalent but look more like CSV:

filename1: [ data11, data12, data13 ]
filename2: [ data12 ]

otherwise your yaml would look like (array of arrays:)
-
  - filename1
  - data11
  - data12
  - data13
-
  - filename2
  - data21

or this way:

- [ filename1, data11, data12, data13 ]
- [ filename2, data21 ]

Cheers,

···

On 8/24/06, Jan Svitok <jan.svitok@gmail.com> wrote:

--
Nicolas Desprès

(Guest) #9

Nicolas Desprès wrote:

···

On 8/24/06, Jan Svitok <jan.svitok@gmail.com> wrote:

[...]

}

You can also use the inline array which is equivalent but look more like
CSV:

filename1: [ data11, data12, data13 ]
filename2: [ data12 ]

otherwise your yaml would look like (array of arrays:)
-
  - filename1
  - data11
  - data12
  - data13
-
  - filename2
  - data21

or this way:

- [ filename1, data11, data12, data13 ]
- [ filename2, data21 ]

Cheers,

So, all of the "data" entries in your "filename" arrays are simply
values? So, each "filename" row is an array in itself? Is that what you
mean by "inline array?"

--
Posted via http://www.ruby-forum.com/.

(Nicolas Desprès) #10

No. I used this phrase because it is the one used in the Yaml cookbook:

http://yaml4r.sourceforge.net/cookbook/

I just said that there are two ways to write an array in Yaml:

- foo
- bar

or

[ foo, bar ]

Cheers,

···

On 8/24/06, Peter Bailey <pbailey@bna.com> wrote:

Nicolas Desprès wrote:
> On 8/24/06, Jan Svitok <jan.svitok@gmail.com> wrote:
>
> [...]
>
>> }
>>
> You can also use the inline array which is equivalent but look more like
> CSV:
>
> filename1: [ data11, data12, data13 ]
> filename2: [ data12 ]
>
>> otherwise your yaml would look like (array of arrays:)
>> -
>> - filename1
>> - data11
>> - data12
>> - data13
>> -
>> - filename2
>> - data21
>>
>
> or this way:
>
> - [ filename1, data11, data12, data13 ]
> - [ filename2, data21 ]
>
> Cheers,

So, all of the "data" entries in your "filename" arrays are simply
values? So, each "filename" row is an array in itself? Is that what you
mean by "inline array?"

--
Nicolas Desprès

(why the lucky stiff) #11

Hi,

>So, all of the "data" entries in your "filename" arrays are simply
>values? So, each "filename" row is an array in itself? Is that what you
>mean by "inline array?"

No. I used this phrase because it is the one used in the Yaml cookbook:

http://yaml4r.sourceforge.net/cookbook/

I just said that there are two ways to write an array in Yaml:

- foo
- bar

or

[ foo, bar ]

Right, the 'inline' is a formatting style.

  >> require 'yaml'
  => true

  >> ary = [1,2,3]
  => [1, 2, 3]

···

On Fri, Aug 25, 2006 at 04:17:51AM +0900, Nicolas Desprès wrote:

On 8/24/06, Peter Bailey <pbailey@bna.com> wrote:

  >> y ary
  ---
  - 1
  - 2
  - 3
  => nil

  >> def ary.to_yaml_style; :inline; end
  => nil

  >> y ary
  --- [1, 2, 3]
  => nil

_why

(Guest) #12

I just said that there are two ways to write an array in Yaml:

- foo
- bar

or

[ foo, bar ]

Cheers,

OK. Sorry. I didn't realize you were discussing YAML there. Thanks!

···

--
Posted via http://www.ruby-forum.com/.

(Ara.T.Howard) #13

hi _why-

i really rely on this these days - please don't change! :wink:

-a

···

On Fri, 25 Aug 2006, why the lucky stiff wrote:

Hi,

On Fri, Aug 25, 2006 at 04:17:51AM +0900, Nicolas Desprès wrote:
> On 8/24/06, Peter Bailey <pbailey@bna.com> wrote:
> >So, all of the "data" entries in your "filename" arrays are simply
> >values? So, each "filename" row is an array in itself? Is that what you
> >mean by "inline array?"
> > No. I used this phrase because it is the one used in the Yaml cookbook:
> > http://yaml4r.sourceforge.net/cookbook/
> > I just said that there are two ways to write an array in Yaml:
> > - foo
> - bar
> > or
> > [ foo, bar ]

Right, the 'inline' is a formatting style.

  >> require 'yaml'
  => true

  >> ary = [1,2,3]
  => [1, 2, 3]

  >> y ary
  --- - 1
  - 2
  - 3
  => nil

  >> def ary.to_yaml_style; :inline; end
  => nil

  >> y ary
  --- [1, 2, 3]
  => nil

_why

--
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dalai lama

(Nicolas Desprès) #14

I hope this help you.

···

On 8/24/06, Peter Bailey <pbailey@bna.com> wrote:

>
> I just said that there are two ways to write an array in Yaml:
>
> - foo
> - bar
>
> or
>
> [ foo, bar ]
>
> Cheers,

OK. Sorry. I didn't realize you were discussing YAML there. Thanks!

--
Nicolas Desprès

(why the lucky stiff) #15

Me too! I rely on it, too!

Sitting here, hoping I don't change it,

_why

···

On Fri, Aug 25, 2006 at 04:39:44AM +0900, ara.t.howard@noaa.gov wrote:

On Fri, 25 Aug 2006, why the lucky stiff wrote:
> >> def ary.to_yaml_style; :inline; end
> => nil
>
> >> y ary
> --- [1, 2, 3]
> => nil

hi _why-

i really rely on this these days - please don't change! :wink: