Brian,
Philip Rhoades wrote in post #988264:
Your sample code looks like it's handling numeric-style data (although I
realise this is just a test case for the problems you're having).
Integers in the range -2^30..+2^30 (or larger in on a 64-bit machine)
have their values encoded within the reference, so no memory allocation
is done.
Are you talking about the hash key or the hash values?
Either.
Right - for the following in my test script:
h1[ "#{a}.#{b}.#{c}.#{d}" ] = Array.new(2){ Array.new(1){ Array.new( 20, rand(100) ) } }
h1[ "#{a}.#{b}.#{c}.#{d}".freeze ] = Array.new(2){ Array.new(1){ Array.new( 20, rand(100) ) } }
h1[ "#{a}.#{b}.#{c}.#{d}".to_i ] = Array.new(2){ Array.new(1){ Array.new( 20, rand(100) ) } }
h1[ "#{a}.#{b}.#{c}.#{d}".to_i.freeze ] = Array.new(2){ Array.new(1){ Array.new( 20, rand(100) ) } }
I get the following times:
18.350s
18.113s
4.724s
4.896s
So I guess I should live with the slight decrease of readability when searching for particular results in the JSON output file by using ints instead of strings for the hash keys.
- the values in
the real script will all be floats . .
Then they will be allocated on the heap, just like strings. I presume
you're aware of the inherent inaccuracy of floats (in any language), and
are OK with this.
1.0/2.0 == 1.0 + 1.0/2.0 - 1.0
=> true
1.0/10.0 == 1.0 + 1.0/10.0 - 1.0
=> false
I suppose I could convert them all to six or eight digit ints . . they are measures of biological diversity and changing them backwards and forwards is a bit of a hassle but maybe it is worth doing for the speed advantage? - I will try my test script with floats and see what happens.
Or, if you're handling a relatively small set of unique values, you
could use symbols instead of strings. Each symbol reference again
doesn't allocate any memory; it just points to the entry in the symbol
table.
Not sure what you mean - example?
a =
a[0] = :foo
a[1] = :foo
a[2] = :foo
puts a[0].object_id
puts a[1].object_id
puts a[2].object_id
The reason the numbers are called "seeds" is that they correspond to the seed for the random dumber generator in the C/C++ simulation program - so they are all unique for each of the 32,000 simulations.
Or you could use frozen strings and share the references.
LABEL1 = "00".freeze
LABEL2 = "01".freeze
MAP = {LABEL1 => LABEL1, LABEL2=>LABEL2}
a = MAP["00"]
puts a.object_id
puts LABEL1.object_id
I ran that code but I don't understand how it helps . .
It uses less memory if you have (say) millions of identical strings.
Not the case for the keys and unlikely for the values.
It
may help garbage collection performance, but not much else
Although that's more work than symbols, it might be useful depending on
your use case. For example, you could replace a subset of the values you
see with these frozen strings (which covers the majority of the data),
whilst still allowing arbitrary other strings.
Still not clear - examples?
Suppose the strings "foo" and "bar" comprise 80% of your hash keys or
values. Then mapping them to the same frozen string means that you only
have one instance of string "foo" and one instance of string "bar" in
the system, instead of (say) millions of distinct strings. You can still
use individual strings for the other 20%.
Unfortunately this doesn't correspond to my case . .
This is really an edge optimisation though, you really shouldn't need to
be worrying about these things - if they are significant, then perhaps
ruby is the wrong language for the problem in hand.
The other thing that occurred to me was that on my 64-bit machine maybe
I could run 2-3 threads for inserting into the hash table?
Noooo..... even in ruby 1.9, there is a global interpreter lock.
Multiple threads gain you nothing really, except for threads which are
blocked on I/O.
Right.
Even if there were not, having multiple threads contending on the same
hash (and controlling access via, say, a mutex) would be pretty much
guaranteed to make performance worse not better.
OK - oh well it was worth a thought!
Many thanks!
Regards,
Phil.
···
On 2011-03-20 08:31, Brian Candler wrote:
--
Philip Rhoades
GPO Box 3411
Sydney NSW 2001
Australia
E-mail: phil@pricom.com.au