I have a (really) big data structure, which looks like:
Features = [
[“feature”, [“parent1”, “parent2”], {:C1 => [0.0, (7 more)], :C2 => …}],
…a LOT more (like 2500) lines…
]
I was initially using (shudder) XML for storage, but as the structure
developed, we literally had to choose between dumping XML or dumping
Ruby. I thought about other generic methods but ended up settling on
writing out the Ruby representation of the structure and loading it
with require. What I’m currently using is exactly like the above but
with the constant wrapped in a module.
My question is: Is there an even faster way to load a big structure
than this?
My first thought was to use Marshal, but I was surprised to find that
Marshal.load takes about twice as long as require:
$ ruby -v
ruby 1.8.0 (2003-08-04) [sparc-solaris2.8]
$ time ruby -e "require ‘network’"
real 0m3.431s
user 0m3.080s
sys 0m0.200s
$ time ruby -e "File.open(‘network.dump’) {|f| Marshal.load(f)}"
real 0m7.321s
user 0m6.880s
sys 0m0.170s
Tell me how it works out. I would be interested to know.
Cheers,
Daniel.
···
On Thu, Dec 11, 2003 at 10:27:02AM +0900, Steven Lumos wrote:
I have a (really) big data structure, which looks like:
Features = [
[“feature”, [“parent1”, “parent2”], {:C1 => [0.0, (7 more)], :C2 => …}],
…a LOT more (like 2500) lines…
]
I was initially using (shudder) XML for storage, but as the structure
developed, we literally had to choose between dumping XML or dumping
Ruby. I thought about other generic methods but ended up settling on
writing out the Ruby representation of the structure and loading it
with require. What I’m currently using is exactly like the above but
with the constant wrapped in a module.
My question is: Is there an even faster way to load a big structure
than this?
My first thought was to use Marshal, but I was surprised to find that
Marshal.load takes about twice as long as require:
$ ruby -v
ruby 1.8.0 (2003-08-04) [sparc-solaris2.8]
$ time ruby -e “require ‘network’”
real 0m3.431s
user 0m3.080s
sys 0m0.200s
$ time ruby -e “File.open(‘network.dump’) {|f| Marshal.load(f)}”
real 0m7.321s
user 0m6.880s
sys 0m0.170s
Steve
–
Daniel Carrera | “Software is like sex. It’s better when it’s free”.
PhD student. |
Math Dept. UMD | – Linus Torvalds
Curious. What kind of machine are you running this on? These times seem a bit
slow in general.
I’m also wondering how Yaml would compare. Are you running ruby 1.8+? If so
try using to_yaml and dumping the result to a file, and then reload. Simple
example to help if you’re not familiar with Yaml:
save in yaml format
require ‘yaml’
require ‘network’
File.open(‘network.yaml’,‘w’){|f| f << Features.to_yaml}
You might want to look at the yaml file at this point, it is rather readable.
Then try
load yaml file
require ‘yaml’
Features = Yaml::load(File.open(‘network.yaml’))
And see what kind of times you get.
T.
···
On Thursday 11 December 2003 02:27 am, Steven Lumos wrote:
My question is: Is there an even faster way to load a big structure
than this?
My first thought was to use Marshal, but I was surprised to find that
Marshal.load takes about twice as long as require:
$ ruby -v
ruby 1.8.0 (2003-08-04) [sparc-solaris2.8]
$ time ruby -e “require ‘network’”
real 0m3.431s
user 0m3.080s
sys 0m0.200s
$ time ruby -e “File.open(‘network.dump’) {|f| Marshal.load(f)}”
real 0m7.321s
user 0m6.880s
sys 0m0.170s
I suppose load should be little faster than require
BTW, you may like to take a look at [ruby-talk: 83802],
_why did some experiment on loadin stuff faster using yaml, over xml,
with some tricks
More: are you working with REXML? have you tried using libxml
bindings?
What about the ruby version/platform? 1.6 on windows was really slow,
1.8 is much faster.
···
il Thu, 11 Dec 2003 01:23:56 GMT, Steven Lumos slumos@yahoo.com ha scritto::
I have a (really) big data structure, which looks like:
Features = [
[“feature”, [“parent1”, “parent2”], {:C1 => [0.0, (7 more)], :C2 => …}],
…a LOT more (like 2500) lines…
]
I was initially using (shudder) XML for storage, but as the structure
developed, we literally had to choose between dumping XML or dumping
Ruby. I thought about other generic methods but ended up settling on
writing out the Ruby representation of the structure and loading it
with require. What I’m currently using is exactly like the above but
with the constant wrapped in a module.
My question is: Is there an even faster way to load a big structure
than this?
I have a (really) big data structure, which looks like:
Features = [
[“feature”, [“parent1”, “parent2”], {:C1 => [0.0, (7 more)], :C2 =>
…}],
…a LOT more (like 2500) lines…
]
I was initially using (shudder) XML for storage, but as the structure
developed, we literally had to choose between dumping XML or dumping
Ruby. I thought about other generic methods but ended up settling on
writing out the Ruby representation of the structure and loading it
with require. What I’m currently using is exactly like the above but
with the constant wrapped in a module.
My question is: Is there an even faster way to load a big structure
than this?
My first thought was to use Marshal, but I was surprised to find that
Marshal.load takes about twice as long as require:
$ ruby -v
ruby 1.8.0 (2003-08-04) [sparc-solaris2.8]
$ time ruby -e “require ‘network’”
real 0m3.431s
user 0m3.080s
sys 0m0.200s
$ time ruby -e “File.open(‘network.dump’) {|f| Marshal.load(f)}”
real 0m7.321s
user 0m6.880s
sys 0m0.170s
Strange. Normally I would have suggested a combination of Marshal and
load: The dump is used if it is newer than the Ruby file, otherwise the
ruby file is loaded and dumped. This should yield quite fast loading
speed while maintaining simple editibility. (Is that an English word?
:-))
However, you might want to reconsider your data structure. Maybe there is
a more efficient way of handling this. You could use path names as
feature keys into a single Hash for example:
Features = {
“feature.parent1” => true,
“feature.parent2” => true,
…
}
Of course this is just a guess since I don’t know the data at hand.
Date: Thu, 11 Dec 2003 01:23:56 GMT
From: Steven Lumos slumos@yahoo.com
Newsgroups: comp.lang.ruby
Subject: [Q] Fast loading of BIG data structures
I have a (really) big data structure, which looks like:
Features = [
[“feature”, [“parent1”, “parent2”], {:C1 => [0.0, (7 more)], :C2 => …}],
…a LOT more (like 2500) lines…
]
I was initially using (shudder) XML for storage, but as the structure
developed, we literally had to choose between dumping XML or dumping
Ruby. I thought about other generic methods but ended up settling on
writing out the Ruby representation of the structure and loading it
with require. What I’m currently using is exactly like the above but
with the constant wrapped in a module.
that’s a cool idea: a code generation database
My question is: Is there an even faster way to load a big structure than
this?
if you could simplify your structure a little it might be good to put into a
bdb. you may not see a huge performance gain for one process, but bdb uses
memory pools and so you should see big gain if > 1 process is accessing the
data in a read-only way.
-a
···
On Thu, 11 Dec 2003, Steven Lumos wrote:
ATTN: please update your address books with address below!
The difference between art and science is that science is what we
understand well enough to explain to a computer.
Art is everything else.
– Donald Knuth, “Discover”
/bin/sh -c ‘for l in ruby perl;do $l -e “print "\x3a\x2d\x29\x0a"”;done’
===============================================================================
I have a (really) big data structure, which looks like:
Features = [
[“feature”, [“parent1”, “parent2”], {:C1 => [0.0, (7 more)], :C2 => …}],
…a LOT more (like 2500) lines…
]
My question is: Is there an even faster way to load a big structure
than this?
Special purpose C extension. Not that I recommend it in general
(leave alone: like it) but sometimes hand-optimized C code is the
best solution at hand.
My question is: Is there an even faster way to load a big structure
than this?
My first thought was to use Marshal, but I was surprised to find that
Marshal.load takes about twice as long as require:
$ ruby -v
ruby 1.8.0 (2003-08-04) [sparc-solaris2.8]
$ time ruby -e “require ‘network’”
real 0m3.431s
user 0m3.080s
sys 0m0.200s
$ time ruby -e “File.open(‘network.dump’) {|f| Marshal.load(f)}”
real 0m7.321s
user 0m6.880s
sys 0m0.170s
Curious. What kind of machine are you running this on? These times seem a bit
slow in general.
That was on a Blade 2000, but the timing for the require case is
basically the same on an Athlon 1600 running Windows 2000.
I’m also wondering how Yaml would compare. Are you running ruby 1.8+? If so
try using to_yaml and dumping the result to a file, and then reload. Simple
example to help if you’re not familiar with Yaml:
I love Yaml, but I didn’t try it because it’s already documented as
being slower than Marshal.
Steve
···
On Thursday 11 December 2003 02:27 am, Steven Lumos wrote: