Need some Ruby magic

I'd like to sort collections randomly. This is what I tried first:

my_array.sort { |a,b| rand(2) }

but the results weren't very random. I then tried the following:

class Array

  def random_copy
    b = Array.new
    while b.length < self.length
      random_value = self[rand self.length]
      b.push random_value unless b.include? random_value
    end
    b
  end
end

This works but I'm sure there's some Ruby magic for doing this better.

thanks

Hi --

···

On Fri, 2 Dec 2005, Hammed Malik wrote:

I'd like to sort collections randomly. This is what I tried first:

my_array.sort { |a,b| rand(2) }

but the results weren't very random. I then tried the following:

class Array

  def random_copy
    b = Array.new
    while b.length < self.length
      random_value = self[rand self.length]
      b.push random_value unless b.include? random_value
    end
    b
  end
end

This works but I'm sure there's some Ruby magic for doing this better.

I think the most common idiom is:

   array.sort_by { rand }

David

--
David A. Black
dblack@wobblini.net

"Ruby for Rails", forthcoming from Manning Publications, April 2006!

hammed wrote:

I'd like to sort collections randomly. This is what I tried first:

my_array.sort { |a,b| rand(2) }

my_array.sort_by { rand }

···

--
Posted via http://www.ruby-forum.com/\.

excellent! thanks.

···

On 01/12/05, Jim Weirich <jim-keyword-rforum.c88827@weirichhouse.org> wrote:

hammed wrote:
> I'd like to sort collections randomly. This is what I tried first:
>
> my_array.sort { |a,b| rand(2) }

my_array.sort_by { rand }

--
Posted via http://www.ruby-forum.com/\.

Maybe there should be Array#randomize...

In article <dd3f270e4d20842e121bb970bc9a8386@ruby-forum.com>,

hammed wrote:
> I'd like to sort collections randomly. This is what I tried first:
>
> my_array.sort { |a,b| rand(2) }

my_array.sort_by { rand }

A quick test seems to indicate that this works with the ruby 1.8.2
implementation on my Mac. However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

Whether it does will depend on the exact algorithm used by 'sort'. For
instance, if sort uses the (inefficient for large arrays) bubble sort,
the last key will end up staying there in half the cases.

Reinder

···

Jim Weirich <jim-keyword-rforum.c88827@weirichhouse.org> wrote:

Take a look @ the Facets project. There are extension methods for array to
have a number of randomized behaviors.

j.

···

On 12/1/05, Vincent Foley <vfoley@gmail.com> wrote:

Maybe there should be Array#randomize...

--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff Wood

How about just doing this?

class Array
   def shuffle
     newarr = self
     self.length.times do |x|
       y = rand(self.length)
       newarr, newarr[y] = newarr[y], newarr
     end
     newarr
   end
end

shuffled_array = my_array.shuffle

--Steve

···

On Dec 3, 2005, at 9:32 AM, Reinder Verlinde wrote:

However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

Actually, sort_by caches every value passed and uses those for
comparison (aka a Schwartzian Transform), so it would essentially
return the same results as sorting a list of random numbers,
regardless of algorithm.

Sam

···

On 12/4/05, Reinder Verlinde <reinder@verlinde.invalid> wrote:

In article <dd3f270e4d20842e121bb970bc9a8386@ruby-forum.com>,
Jim Weirich <jim-keyword-rforum.c88827@weirichhouse.org> wrote:

> hammed wrote:
> > I'd like to sort collections randomly. This is what I tried first:
> >
> > my_array.sort { |a,b| rand(2) }
>
> my_array.sort_by { rand }

A quick test seems to indicate that this works with the ruby 1.8.2
implementation on my Mac. However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

Whether it does will depend on the exact algorithm used by 'sort'. For
instance, if sort uses the (inefficient for large arrays) bubble sort,
the last key will end up staying there in half the cases.

reinder wrote:

my_array.sort_by { rand }

A quick test seems to indicate that this works with the ruby 1.8.2
implementation on my Mac. However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

Actually, it will. sort_by uses a Schwartzian transform to do the
sorting. That means that rand is only called once for each element of
the array. As long as rand gives a decent distribution of random
numbers, the permutation will be random too.

Here's a little experiment:

···

-------------------------------------------------
N = (ARGV.shift || 1000).to_i
A = %w(a b c)
Perms = %w(abc acb bac bca cab cba)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
  sorted = A.dup.sort_by { rand }
  Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
  puts "#{key}: #{Score[key]}"
end
-------------------------------------------------

Gives:

  $ ruby sortdist.rb 100000
  abc: 16693
  acb: 16688
  bac: 16752
  bca: 16590
  cab: 16475
  cba: 16802

Thats a pretty good distribution for the number of iterations.

-- Jim Weirich

--
Posted via http://www.ruby-forum.com/\.

sort_by{ rand } is actually biased, since sort_by will preserve the
relative order of the elements for which rand() returned the same value.

%w[a b c d].sort_by{ 0 } # => ["a", "b", "c", "d"]
i = 0
%w[a b c d].sort_by{ 10 - (i += 1) / 2 } # => ["d", "b", "c", "a"]

This means that permutations preserving the relative order of one (or
more) pair of elements of the original array are a bit more probable.

In praxis, the bias this introduces is often so small that the mere
succinctness of sort_by{ rand } more than makes up for it.

···

On Sun, Dec 04, 2005 at 08:48:11AM +0900, Jim Weirich wrote:

reinder wrote:
>> my_array.sort_by { rand }
>
> A quick test seems to indicate that this works with the ruby 1.8.2
> implementation on my Mac. However, there is no guarantee that this call
> will select each of the n! permutations with equal probability.

Actually, it will. sort_by uses a Schwartzian transform to do the
sorting. That means that rand is only called once for each element of
the array. As long as rand gives a decent distribution of random
numbers, the permutation will be random too.

--
Mauricio Fernandez

How about just doing this?

class Array
   def shuffle
     newarr = self
     self.length.times do |x|
       y = rand(self.length)
       newarr, newarr[y] = newarr[y], newarr
     end
     newarr
   end
end

shuffled_array = my_array.shuffle

"Just" doing that
1) changes the receiver (use new_arr = self.dup instead)
2) definitely has a bias. Look e.g. at an array of length three. You will
   first swap the first element with 3 possibilities, then the second and so
   on; you're going through 3**3 possibilities. There are 3! possible
   permutations, but alas, 3! == 6 is not a perfect divisor of 3**3 == 27,
   thus you have a bias.

does

   %w( a b c d ).sort{ rand <=> rand }

help?

-a

···

On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:

On Sun, Dec 04, 2005 at 08:48:11AM +0900, Jim Weirich wrote:

reinder wrote:

my_array.sort_by { rand }

A quick test seems to indicate that this works with the ruby 1.8.2
implementation on my Mac. However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

Actually, it will. sort_by uses a Schwartzian transform to do the
sorting. That means that rand is only called once for each element of
the array. As long as rand gives a decent distribution of random
numbers, the permutation will be random too.

sort_by{ rand } is actually biased, since sort_by will preserve the
relative order of the elements for which rand() returned the same value.

%w[a b c d].sort_by{ 0 } # => ["a", "b", "c", "d"]
i = 0
%w[a b c d].sort_by{ 10 - (i += 1) / 2 } # => ["d", "b", "c", "a"]

This means that permutations preserving the relative order of one (or
more) pair of elements of the original array are a bit more probable.

In praxis, the bias this introduces is often so small that the mere
succinctness of sort_by{ rand } more than makes up for it.

--

ara [dot] t [dot] howard [at] noaa [dot] gov
all happiness comes from the desire for others to be happy. all misery
comes from the desire for oneself to be happy.
-- bodhicaryavatara

===============================================================================

sort_by{ rand } is actually biased, since sort_by will preserve the
relative order of the elements for which rand() returned the same value.

How frequently would rand return a same value? (In theory it would be
never but does anyone know the reality?)

Good points, thanks. How about this?

class Array
   def shuffle
     newarr = self.dup
     2.times do
       self.length.times do |x|
         y = rand(self.length)
         newarr, newarr[y] = newarr[y], newarr
       end
     end
     newarr
   end
end

# test code lifted from Jim Weirich's earlier post
N = (ARGV.shift || 1000).to_i
A = %w(a b c)
Perms = %w(abc acb bac bca cab cba)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
   sorted = A.shuffle
   Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
   puts "#{key}: #{Score[key]}"
end

[~/Code/private/code/snippets] 382% ./shuffle.rb 100000
abc: 16596
acb: 16394
bac: 16700
bca: 16684
cab: 16841
cba: 16785

0:07.87 seconds, 96.4% CPU

This is really just meant to be a simple hack. I think a better algorithm could assign a random index to each element, then sort on those indices.

--Steve

···

On Dec 4, 2005, at 12:32 PM, Kero wrote:

1) changes the receiver (use new_arr = self.dup instead)
2) definitely has a bias. Look e.g. at an array of length three. You will
   first swap the first element with 3 possibilities, then the second and so
   on; you're going through 3**3 possibilities. There are 3! possible
   permutations, but alas, 3! == 6 is not a perfect divisor of 3**3 == 27,
   thus you have a bias.

According to the Pickaxe, Kernel.rand is an implementation of the [Mersenne Twister][1] algorithm. If that is still the case, then in reality it will begin repeating exactly the same numbers after 2^19937-1 (approx. 4e6001) pseudo-random numbers, as that's the period of the MT. The period is the point at which the exact same series of PRNs will begin again. For example, if it started out as 1, 2, 3, ..., then after 2^19937-1, it'd start again at 1, 2, 3.

But, that's not what you really care about.

Numbers will certainly repeat within the period, since the period is so much larger than our integer representation. For example, a 32 bit integer can only represent about 4e9. So, the likelihood of getting any specific integer, in the 32 bit case, is 1/(2^32). This is an extreme example:

5.times { p rand(2) }
5.times { p rand(1000) }

The bottom line is that the PRNs should be approximately uniformly distributed. Apply that to your specific range, and you have your answer.

--Steve

[1]: Mersenne Twister - Wikipedia

···

On Dec 4, 2005, at 7:59 AM, Brian Buckley wrote:

How frequently would rand return a same value? (In theory it would be
never but does anyone know the reality?)

>sort_by{ rand } is actually biased, since sort_by will preserve the
>relative order of the elements for which rand() returned the same value.

I did some numbers and found the following probability estimates:

array size P(#rand() collision)
1000 5.54558e-11
10000 5.55056e-09
1000000 5.55096e-05
10000000 5.53574e-03

A collision implies that the associated pair of elements has got the same
ordering in the output array, instead of 50-50 chances of it being reversed.

More info, including the Ruby code I used, available at
http://eigenclass.org/hiki.rb?sort_by+rand+is+biased

does

  %w( a b c d ).sort{ rand <=> rand }

help?

I'm not totally sure it's really unbiased, but at first sight it looks good.

When called with no arguments, rand will use

static double
genrand_real(void)
{
    unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6;
    return(a*67108864.0+b)*(1.0/9007199254740992.0);
}

so it would seem that two consecutive calls to rand() cannot return the
same value, but I still have to think about the effect of qsort.

It is however somewhat slower than sort_by{ rand }.

[100, 1000, 10000, 100000].each do |n|
    GC.start
    t0 = Time.new
    (1..n).sort_by{rand}
    a = (t1 = Time.new) - t0 # => 0.000425, 0.003002, 0.054392, 0.939692
    (1..n).sort{rand <=> rand}
    b = Time.new - t1 # => 0.001161, 0.026275, 0.289807, 3.791833
    "%4.2f" % [b / a] # => "2.73", "8.75", "5.33", "4.04"
end
RUBY_VERSION # => "1.8.3"
RUBY_RELEASE_DATE # => "2005-09-24"

···

On Sun, Dec 04, 2005 at 10:21:02AM +0900, ara.t.howard@noaa.gov wrote:

On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:

--
Mauricio Fernandez

http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code,
with a nice proof that it's unbiased.

http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on
shuffling algorithms.

martin

···

Stephen Waits <steve@waits.net> wrote:

This is really just meant to be a simple hack. I think a better
algorithm could assign a random index to each element, then sort on
those indices.

I should add to this that you can approximate when you'll get repeats, and it's likely to be much sooner than you'd guess. For more information, see the [Birthday Problem][1].

--Steve

[1]: Birthday Problem -- from Wolfram MathWorld

···

On Dec 4, 2005, at 8:54 AM, Stephen Waits wrote:

Numbers will certainly repeat within the period, since the period is so much larger than our integer representation. For example, a 32 bit integer can only represent about 4e9. So, the likelihood of getting any specific integer, in the 32 bit case, is 1/(2^32).

Mauricio Fernández wrote:

does

  %w( a b c d ).sort{ rand <=> rand }

help?

It is however somewhat slower than sort_by{ rand }.

It's considerably slower. Changing it to sort { rand(10000) <=> 5000 } shows a 30% speedup on 10k element Arrays on my system. Don't know why I chose those numbers, I suppose bigger numbers reduce the "0" comparison, considering { rand(3) <=> 1 }.

Anyway, I put together a small benchmark to compare my (revised) shuffle, to sort, and sort_by:

#!/usr/bin/env ruby

class Array
   def shuffle
     newarr = self.dup
     2.times do
       self.length.times do |x|
         y = rand(self.length)
         newarr, newarr[y] = newarr[y], newarr
       end
     end
     newarr
   end
end

printf(" size shuffle sort sort_by\n")

[10, 100, 1000, 10000].each do |n|

   # populate an Array size n
   a = Array.new(n) { |i| i }

   # my method
   start = Time.new
   100.times do
     b = a.shuffle
   end
   a_time = Time.new - start

   # Array's sort
   start = Time.new
   100.times do
     b = a.sort { rand(10000) <=> 5000 }
   end
   b_time = Time.new - start

   # Enumerable's sort_by
   start = Time.new
   100.times do
     b = a.sort_by { rand }
   end
   c_time = Time.new - start

   # results
   printf("%6d %5.2f %5.2f %5.2f\n", n, a_time, b_time, c_time)
  
end

Which, on my P4/3.0GHz (Win32) emits this:

  size shuffle sort sort_by
     10 0.00 0.02 0.00
    100 0.16 0.13 0.03
   1000 1.61 2.03 0.44
  10000 16.31 28.14 4.78

--Steve