Thanks. Very interesting.
Turns out I was wrong for ruby 1.9.3 (but right for some other rubies).
···
On Tue, Jan 17, 2012 at 2:44 AM, Abinoam Jr. <abinoam@gmail.com> wrote:
On Mon, Jan 16, 2012 at 10:05 PM, Abinoam Jr. <abinoam@gmail.com> wrote:
==========
For ruby 1.9.3-p0 all solutions perform approx. the same (at least same
order)
and seem to stay quasi linear.
SIZE
1000000 #[1_000_000 that is ; n = 1]
user system total real
Ralph Shneiver: 0.180000 0.000000 0.180000 ( 0.174283)
Sigurd: 0.200000 0.000000 0.200000 ( 0.203652)
Keinich #1 0.140000 0.000000 0.140000 ( 0.142833)
Keinich #2 0.180000 0.000000 0.180000 ( 0.177456)
Magnus Holm: 0.200000 0.000000 0.200000 ( 0.205895)
Abinoam #1: 0.260000 0.000000 0.260000 ( 0.254554)
SIZE
2000000 #[2_000_000 that is ; n = 1]
user system total real
Ralph Shneiver: 0.340000 0.010000 0.350000 ( 0.350032)
Sigurd: 0.410000 0.000000 0.410000 ( 0.406483)
Keinich #1 0.280000 0.000000 0.280000 ( 0.285213)
Keinich #2 0.350000 0.010000 0.360000 ( 0.354640)
Magnus Holm: 0.410000 0.000000 0.410000 ( 0.411010)
Abinoam #1: 0.470000 0.030000 0.500000 ( 0.498782)
SIZE
10000000 #[10_000_000 that is; n = 1 here]
user system total real
Ralph Shneiver: 1.710000 0.040000 1.750000 ( 1.748137)
Sigurd: 2.000000 0.010000 2.010000 ( 2.012496)
Keinich #1 1.380000 0.030000 1.410000 ( 1.409462)
Keinich #2 1.750000 0.010000 1.760000 ( 1.760997)
Magnus Holm: 1.990000 0.020000 2.010000 ( 2.014282)
Abinoam #1: 2.500000 0.060000 2.560000 ( 2.562646)
All solutions in Ruby 1.9.3 are relatively "linear".
Keinich #1 is the fastest.
I believe it is because I limited the dataset to 100 different values, the
number of
bins is fixed and order O(n) can be achieved (not the O(n.log(n)) that I had
expected, incorrectly).
RUBY 1.8.7:
For ruby 1.8.7-p I could not test the last 2 solutions (methods not
implemented).
For the first 3, _significant_ differences arise.
SIZE
1000000 #[1_000_000 that is ; n = 1]
user system total real
Ralph Shneiver: 0.370000 0.010000 0.380000 ( 0.369785)
Sigurd: 2.320000 0.030000 2.350000 ( 2.360403)
Keinich #1 1.520000 0.040000 1.560000 ( 1.562627)
Keinich #2 16.520000 0.100000 16.620000 ( 16.623032)
SIZE
2000000 #[2_000_000 that is ; n = 1]
user system total real
Ralph Shneiver: 0.720000 0.010000 0.730000 ( 0.737673)
Sigurd: 8.040000 0.110000 8.150000 ( 8.142827)
Keinich #1 5.670000 0.040000 5.710000 ( 5.716364)
Keinich #2 52.600000 0.480000 53.080000 ( 53.100823)
SIZE
10000000 #[10_000_000 that is ; n = 1]
user system total real
Ralph Shneiver: 3.680000 0.000000 3.680000 ( 3.680230)
Striking: Only the solution of Ralph Shneiver remains linear in MRI 1.8.7
JRUBY (1.6.5.1)
$ ruby -v
jruby 1.6.5.1 (ruby-1.8.7-p330) (2011-12-27 1bf37c2) (OpenJDK Server VM
1.6.0_23) [linux-i386-java]
I also tested JRuby out of curiosity.
SIZE
1000000 #[1_000_000 that is ; n = 1]
user system total real
Ralph Shneiver: 0.244000 0.000000 0.244000 ( 0.220000)
Sigurd: 0.264000 0.000000 0.264000 ( 0.264000)
Keinich #1 0.112000 0.000000 0.112000 ( 0.112000)
Keinich #2 11.256000 0.000000 11.256000 ( 11.256000)
SIZE
2000000 #[2_000_000 that is ; n = 1]
user system total real
Ralph Shneiver: 0.392000 0.000000 0.392000 ( 0.368000)
Sigurd: 0.439000 0.000000 0.439000 ( 0.438000)
Keinich #1 0.196000 0.000000 0.196000 ( 0.196000)
Keinich #2 14.462000 0.000000 14.462000 ( 14.462000)
SIZE
10000000 #[10_000_000 that is ; n = 1]]
user system total real
Ralph Shneiver: 1.604000 0.000000 1.604000 ( 1.581000)
Sigurd: 1.769000 0.000000 1.769000 ( 1.769000)
Keinich #1 0.967000 0.000000 0.967000 ( 0.967000)
Keinich #2 8.978000 0.000000 8.978000 ( 8.978000)
Interesting again.
Those 2 solutions where not yet available on JRuby 1.6.5.1:
Magnus Holm: NoMethodError: undefined method `each_with_object' for
#<Array:0x18eb7b8>
Abinoam #1: NoMethodError: undefined method `chunk' for
#<Array:0x1719f30>
Ralph Shneiver remains linear again.
But then again ... Keinich #1 is significantly faster on 10_000_000 than
the others.
Keinich #2 seems to be not very predictable?
JRUBY HEAD (in rvm):
$ ruby -v
jruby 1.7.0.dev (ruby-1.8.7-p357) (2012-01-17 7de254f) (Java HotSpot(TM)
Server VM 1.6.0_26) [linux-i386-java]
1000000 #[1_000_000 that is ; n = 1]
user system total real
Ralph Shneiver: 0.238000 0.000000 0.238000 ( 0.223000)
Sigurd: 0.286000 0.000000 0.286000 ( 0.286000)
Keinich #1 0.120000 0.000000 0.120000 ( 0.120000)
Keinich #2 7.841000 0.000000 7.841000 ( 7.841000)
SIZE
2000000 #[2_000_000 that is ; n = 1]
user system total real
Ralph Shneiver: 0.426000 0.000000 0.426000 ( 0.410000)
Sigurd: 0.478000 0.000000 0.478000 ( 0.478000)
Keinich #1 0.213000 0.000000 0.213000 ( 0.213000)
Keinich #2 21.928000 0.000000 21.928000 ( 21.928000)
SIZE
10000000 #[10_000_000 that is ; n = 1]
user system total real
Ralph Shneiver: 1.550000 0.000000 1.550000 ( 1.535000)
Sigurd: 1.795000 0.000000 1.795000 ( 1.795000)
Keinich #1 1.060000 0.000000 1.060000 ( 1.060000)
Keinich #2 116.826000 0.000000 116.826000 (116.826000)
Similar results to JRuby 1.6.5.1
Caveat:
I did not check the correctness of the result, only the timing.
Questions:
Why would ruby 1.9.3. be so much better at this than ruby 1.8.7 ?
Could it be because the Hash is now "ordered" so it can do an efficient
algorithm when adding an entry to a bin? Or is it object creation?
Why are certain methods leading to non-linear behavior?
My conclusions (?):
* be careful about performance with large data sets
* Fastest overall: Keinich #1 on JRuby 1.6.5.1
* CRuby 1.9.3 seems to be the only on the remains linear for all
solutions on large data sets.
* "Ralph Shneiver" is the only solution that remains linear for
all tested rubies (MRI 1.9.3, MRI 1.8.7, JRuby 1.6.5.1 JRuby head)
HTH,
Peter