[YAML] reading stream of ducments

(Joel VanderWerf) #1

I've got a file which is a '---' separated list of YAML docs. I can read
them all with

  YAML.load_documents(file) {|doc| ...}

but what I'd really like to do is read one from the stream and come back
to the stream later and read it. I tried doing that with #load, but I
only get one of the documents and then the file pos goes to the end of
the file and the second document cannot be read:

$ cat >docs.yaml

···

---
a: 1
b: 2
---
a: 3
b: 4
$ irb -r yaml
irb(main):001:0> f = File.open('docs.yaml')
=> #<File:docs.yaml>
irb(main):002:0> YAML.load(f)
=> {"a"=>1, "b"=>2}
irb(main):003:0> YAML.load(f)
=> false
irb(main):004:0> f.rewind
=> 0
irb(main):008:0> YAML.load(f)
=> {"a"=>1, "b"=>2}
irb(main):009:0> f.pos
=> 28
irb(main):010:0> f.rewind
=> 0
irb(main):011:0> YAML.load_documents(f) {|d| p d}
{"a"=>1, "b"=>2}
{"a"=>3, "b"=>4}

I guess I could fire up a generator to externalize the iterator, but why
can't YAML#load just read one doc at a time and not advance the file
pointer farther than the end of the doc?

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

(Dale Martenson) #2

---- Original message from Joel VanderWerf on 8/22/2005 12:35 PM:

I've got a file which is a '---' separated list of YAML docs. I can read
them all with

YAML.load_documents(file) {|doc| ...}

but what I'd really like to do is read one from the stream and come back
to the stream later and read it. I tried doing that with #load, but I
only get one of the documents and then the file pos goes to the end of
the file and the second document cannot be read:

$ cat >docs.yaml
---
a: 1
b: 2
---
a: 3
b: 4
$ irb -r yaml
irb(main):001:0> f = File.open('docs.yaml')
=> #<File:docs.yaml>
irb(main):002:0> YAML.load(f)
=> {"a"=>1, "b"=>2}
irb(main):003:0> YAML.load(f)
=> false
irb(main):004:0> f.rewind
=> 0
irb(main):008:0> YAML.load(f)
=> {"a"=>1, "b"=>2}
irb(main):009:0> f.pos
=> 28
irb(main):010:0> f.rewind
=> 0
irb(main):011:0> YAML.load_documents(f) {|d| p d}
{"a"=>1, "b"=>2}
{"a"=>3, "b"=>4}

I guess I could fire up a generator to externalize the iterator, but why
can't YAML#load just read one doc at a time and not advance the file
pointer farther than the end of the doc?

If it is practical to load all the data at one time, you could use YAML::load_stream. This would give you and array of documents to access.

irb(main):031:0> d = YAML::load_stream( File.open( 'docs.yaml' ) )
=> #<YAML::Stream:0x2abcab0 @documents=[{"a"=>1, "b"=>2}, {"a"=>3, "b"=>4}], @options={}>
irb(main):032:0> d.documents[0]
=> {"a"=>1, "b"=>2}
irb(main):033:0> d.documents[1]
=> {"a"=>3, "b"=>4}

Then you could access each document as needed. But if you wanted to keep them on disk, you might have to derive your own method.

(Joel VanderWerf) #3

Dale Martenson wrote:

---- Original message from Joel VanderWerf on 8/22/2005 12:35 PM:

I've got a file which is a '---' separated list of YAML docs. I can read
them all with

YAML.load_documents(file) {|doc| ...}

but what I'd really like to do is read one from the stream and come back
to the stream later and read it. I tried doing that with #load, but I
only get one of the documents and then the file pos goes to the end of
the file and the second document cannot be read:

...

If it is practical to load all the data at one time, you could use
YAML::load_stream. This would give you and array of documents to access.

irb(main):031:0> d = YAML::load_stream( File.open( 'docs.yaml' ) )
=> #<YAML::Stream:0x2abcab0 @documents=[{"a"=>1, "b"=>2}, {"a"=>3,
"b"=>4}], @options={}>
irb(main):032:0> d.documents[0]
=> {"a"=>1, "b"=>2}
irb(main):033:0> d.documents[1]
=> {"a"=>3, "b"=>4}

Then you could access each document as needed. But if you wanted to keep
them on disk, you might have to derive your own method.

Yeah, I saw that, but I'd rather not inhale the whole file at once. The
parsing time and memory usage would not scale well...

···

--
      vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407