YAML::dump slow

This is interesting and counterintuitive ...

I am creating a YAML file with about 200,000 entries. The YAML file "reflects" the data structure of a directory tree of a disk. All I have is directory names and subdirectories and filenames in the (sub) directories. I have 100,000 simulated directories and 100,000 simulated files.

It takes about
   10 seconds to create this data structure in memory
   10 _minutes_ to create the YAML string (i.e. YAML::dump)
    1 second to write out the YAML string (25 megabytes)
    4 seconds to read the YAML string (why so long?)
    8 seconds to do the YAML::load

Anyone have any ideas how to speed up the creation of the YAML file or why there is such an asymmetric amount of time for YAML::dump and YAML::load?

Anyone have any ideas how to speed up the creation of the YAML file or
why there is such an asymmetric amount of time for YAML::dump and
YAML::load?

A couple ideas: do a custom yaml dump by hand, or use Marshal.

···

--
Posted via http://www.ruby-forum.com/\.

If you're on 1.9.2, you could try the new YAML library psych:
iota ~ % time ruby -rpsych -e 'Psych.load(Psych.dump(Dir["/sys/**/*"]))'
1.20s user 0.07s system 99% cpu 1.283 total

Of course, that's just a simple Array, but would be interesting to see
how your data behaves.

···

On Sun, Aug 22, 2010 at 1:45 AM, Ralph Shnelvar <ralphs@dos32.com> wrote:

This is interesting and counterintuitive ...

I am creating a YAML file with about 200,000 entries. The YAML file "reflects" the data structure of a directory tree of a disk. All I have is directory names and subdirectories and filenames in the (sub) directories. I have 100,000 simulated directories and 100,000 simulated files.

It takes about
10 seconds to create this data structure in memory
10 _minutes_ to create the YAML string (i.e. YAML::dump)
1 second to write out the YAML string (25 megabytes)
4 seconds to read the YAML string (why so long?)
8 seconds to do the YAML::load

Anyone have any ideas how to speed up the creation of the YAML file or why there is such an asymmetric amount of time for YAML::dump and YAML::load?

--
Michael Fellinger
CTO, The Rubyists, LLC

Michael Fellinger:

If you're on 1.9.2, you could try the new YAML library psych:
iota ~ % time ruby -rpsych -e 'Psych.load(Psych.dump(Dir["/sys/**/*"]))'
1.20s user 0.07s system 99% cpu 1.283 total

What exactly do I need to have Psych on an Ubuntu/rvm install?

chastell@devielle:~$ ruby -vrpsych -e ''
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
<internal:lib/rubygems/custom_require>:29:in `require': no such file to load -- psych (LoadError)
  from <internal:lib/rubygems/custom_require>:29:in `require'
chastell@devielle:~$

(Note: I have libyaml-dev installed.)

— Piotr Szotkowski

···

--
At the dentist’s thinking about whether hints of method size can be
seen in intra-class dependency diagrams. Oh no, here comes the drill.
                                                   [Michael Feathers]

Saturday, August 21, 2010, 4:56:43 PM, you wrote:

···

On Sun, Aug 22, 2010 at 1:45 AM, Ralph Shnelvar <ralphs@dos32.com> wrote:

This is interesting and counterintuitive ...

I am creating a YAML file with about 200,000 entries. The YAML file "reflects" the data structure of a directory tree of a disk. All I have is directory names and subdirectories and filenames in the (sub) directories. I have 100,000 simulated directories and 100,000 simulated files.

It takes about
  10 seconds to create this data structure in memory
  10 _minutes_ to create the YAML string (i.e. YAML::dump)
   1 second to write out the YAML string (25 megabytes)
   4 seconds to read the YAML string (why so long?)
   8 seconds to do the YAML::load

Anyone have any ideas how to speed up the creation of the YAML file or why there is such an asymmetric amount of time for YAML::dump and YAML::load?

If you're on 1.9.2, you could try the new YAML library psych:
iota ~ % time ruby -rpsych -e 'Psych.load(Psych.dump(Dir["/sys/**/*"]))'
1.20s user 0.07s system 99% cpu 1.283 total

Of course, that's just a simple Array, but would be interesting to see
how your data behaves.

Michael,

I am running
  ruby 1.8.7 (2010-01-10 patchlevel 249)

"YAML library psych" ??? What does that mean?

I had trouble also with rvm on OSX, my ruby did not find psych.so.
I just had to reinstall, after installing libyaml in a usual path (/usr/local).

I did have before libyaml with macports, but then you have to pass
configure options like:
rvm install ruby-1.9.2-head -C --with-libyaml-dir=/opt/local

About Ubuntu, I believe it installs in a standard place, so you just
need to install it again.

(psych extension is simply not compiled if it does not find libyaml (yaml.h))

B.D.

···

On 22 August 2010 16:54, Piotr Szotkowski <chastell@chastell.net> wrote:

Michael Fellinger:

If you're on 1.9.2, you could try the new YAML library psych:
iota ~ % time ruby -rpsych -e 'Psych.load(Psych.dump(Dir["/sys/**/*"]))'
1.20s user 0.07s system 99% cpu 1.283 total

What exactly do I need to have Psych on an Ubuntu/rvm install?

chastell@devielle:~$ ruby -vrpsych -e ''
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
<internal:lib/rubygems/custom_require>:29:in `require': no such file to load -- psych (LoadError)
from <internal:lib/rubygems/custom_require>:29:in `require'
chastell@devielle:~$

(Note: I have libyaml-dev installed.)

— Piotr Szotkowski

Benoit Daloze:

rvm install ruby-1.9.2-head -C --with-libyaml-dir=/opt/local

Unfortunately, both rvm-wrapped and vanilla Ruby 1.9.2-p0’s configure
report ‘configure: WARNING: unrecognized options: --with-libyaml-dir’.

In Ubuntu yaml.h lives in /usr/include; I ended up symlinking
it from /usr/local/include and I can now require 'psych'.

— Piotr Szotkowski

···

--
If your website uses geographical IP blocking,
you *definitely* shouldn’t use the ‘www’ prefix.
                                  [Paul Battley]

That is a serious problem, you should probably fill a bug report with it.

Ruby should obviously look in /usr for libraries and include.

···

On 23 August 2010 20:45, Piotr Szotkowski <chastell@chastell.net> wrote:

Benoit Daloze:

rvm install ruby-1.9.2-head -C --with-libyaml-dir=/opt/local

Unfortunately, both rvm-wrapped and vanilla Ruby 1.9.2-p0’s configure
report ‘configure: WARNING: unrecognized options: --with-libyaml-dir’.

In Ubuntu yaml.h lives in /usr/include; I ended up symlinking
it from /usr/local/include and I can now require 'psych'.

— Piotr Szotkowski

Benoit Daloze:

···

On 23 August 2010 20:45, Piotr Szotkowski <chastell@chastell.net> wrote:

Unfortunately, both rvm-wrapped and vanilla Ruby 1.9.2-p0’s configure
report ‘configure: WARNING: unrecognized options: --with-libyaml-dir’.

In Ubuntu yaml.h lives in /usr/include; I ended up symlinking
it from /usr/local/include and I can now require 'psych'.

That is a serious problem, you should probably fill a bug report with it.

Ruby should obviously look in /usr for libraries and include.

I’m sorry, I spoke too soon; it turns out rvm was messing with
my experiments. Ruby 1.9.2-p0 does find /usr/include/yaml.h – but
I still wonder: should --with-libyaml-dir work or is it deprecated?

— Piotr Szotkowski
--
The beep in Vim is there to let others know you’re new and need help.