Ok, I think I've beat sort_by{ rand } on large Arrays , by about 30%. I ended up with this:
h = Hash.new
self.each { |v| h[rand(1000000000)] = v }
h.keys.sort.collect { |k| h[k] }
Explanation..
* Hash works better than an Array of Arrays [[rand,x[0]],[rand,x[1]],...] because Array#<=> seems to be slow.
* Got some speedup by avoiding Float#<=>, hence the Integer version of rand.
* On rand(1000000000); chose 1billion rather arbitrarily to keep things out of the range of BigNum. Oddly, using 2^31-1 caused BigNums to show up. Ideally we'd use the largest FixNum, but I don't know how to retrieve that.
* Bias should be minimal for most applications, even at rand(1Billion).
* Can it be improved further???
Find my benchmark results, and script below.
--Steve
user system total real
shuffle (10) 0.016000 0.000000 0.016000 ( 0.016000)
shuffle2 (10) 0.000000 0.000000 0.000000 ( 0.000000)
shuffle3 (10) 0.000000 0.000000 0.000000 ( 0.000000)
sort (10) 0.000000 0.000000 0.000000 ( 0.000000)
sort_by (10) 0.000000 0.000000 0.000000 ( 0.000000)
user system total real
shuffle (100) 0.156000 0.000000 0.156000 ( 0.156000)
shuffle2 (100) 0.047000 0.000000 0.047000 ( 0.046000)
shuffle3 (100) 0.031000 0.000000 0.031000 ( 0.032000)
sort (100) 0.125000 0.000000 0.125000 ( 0.125000)
sort_by (100) 0.031000 0.000000 0.031000 ( 0.031000)
user system total real
shuffle (1000) 1.625000 0.000000 1.625000 ( 1.625000)
shuffle2 (1000) 0.656000 0.000000 0.656000 ( 0.672000)
shuffle3 (1000) 0.266000 0.000000 0.266000 ( 0.281000)
sort (1000) 2.094000 0.000000 2.094000 ( 2.093000)
sort_by (1000) 0.453000 0.000000 0.453000 ( 0.453000)
user system total real
shuffle (10000) 16.469000 0.032000 16.501000 ( 16.578000)
shuffle2 (10000) 8.156000 0.000000 8.156000 ( 8.203000)
shuffle3 (10000) 3.344000 0.047000 3.391000 ( 3.391000)
sort (10000) 25.453000 0.000000 25.453000 ( 25.593000)
sort_by (10000) 4.765000 0.000000 4.765000 ( 4.796000)
#!/usr/bin/env ruby
require 'benchmark'
class Array
def shuffle
newarr = self.dup
2.times do
self.length.times do |x|
y = rand(self.length)
newarr[x], newarr[y] = newarr[y], newarr[x]
end
end
newarr
end
def shuffle2
tmparr = self.collect { |x| [rand(1000000000),x] }
tmparr.sort.collect { |x| x[1] }
end
def shuffle3
h = Hash.new
self.each { |v| h[rand(1000000000)] = v }
h.keys.sort.collect { |k| h[k] }
end
end
[10, 100, 1000, 10000].each do |n|
# populate an Array size n
a = Array.new(n) { |i| i }
# run our benchmarks
Benchmark.bm(16) do |bb|
# my method
bb.report("shuffle ("+n.to_s+")") do
100.times do
a.shuffle
end
end
# decorate, sort, undecorate
bb.report("shuffle2 ("+n.to_s+")") do
100.times do
a.shuffle2
end
end
# decorate, sort, undecorate with a hash
bb.report("shuffle3 ("+n.to_s+")") do
100.times do
a.shuffle3
end
end
# Array's sort
bb.report("sort ("+n.to_s+")") do
100.times do
a.sort { rand(10000) <=> 5000 }
end
end
# Enumerable's sort_by
bb.report("sort_by ("+n.to_s+")") do
100.times do
a.sort_by { rand }
end
end
printf("\n")
end
end