Need some Ruby magic

the big difference is that a value will have rand called multiple times for it
- in otherwords an element compareds differently every time. this is bound to
help shuffle more uniformly it seems:

   harp:~ > cat a.rb
   %w( a b c d ).sort{|a,b| ra = rand; rb = rand; p a => ra; p b => rb; ra <=> rb }

   harp:~ > ruby a.rb
   {"a"=>0.181439380218727}
   {"c"=>0.579403886629126}
   {"c"=>0.713513989939958}
   {"d"=>0.573687508540169}
   {"a"=>0.851534740906118}
   {"d"=>0.00156074151427565}
   {"b"=>0.803591007691647}
   {"a"=>0.494066110872563}
   {"b"=>0.834676789626288}
   {"c"=>0.792106134460754}

so, note that "a" was sorted using a different value each time.

can anyone who remembers stats and knows the sort algorithm weigh in?

-a

···

On Tue, 6 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:

On Sun, Dec 04, 2005 at 10:21:02AM +0900, ara.t.howard@noaa.gov wrote:

On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:

sort_by{ rand } is actually biased, since sort_by will preserve the
relative order of the elements for which rand() returned the same value.

I did some numbers and found the following probability estimates:

array size P(#rand() collision)
1000 5.54558e-11
10000 5.55056e-09
1000000 5.55096e-05
10000000 5.53574e-03

A collision implies that the associated pair of elements has got the same
ordering in the output array, instead of 50-50 chances of it being reversed.

More info, including the Ruby code I used, available at
http://eigenclass.org/hiki.rb?sort_by+rand+is+biased

does

  %w( a b c d ).sort{ rand <=> rand }

help?

I'm not totally sure it's really unbiased, but at first sight it looks good.

--

ara [dot] t [dot] howard [at] noaa [dot] gov
all happiness comes from the desire for others to be happy. all misery
comes from the desire for oneself to be happy.
-- bodhicaryavatara

===============================================================================

Mauricio Fernández <mfp@acm.org> writes:

When called with no arguments, rand will use

static double
genrand_real(void)
{
    unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6;
    return(a*67108864.0+b)*(1.0/9007199254740992.0);
}

so it would seem that two consecutive calls to rand() cannot return the
same value, but I still have to think about the effect of qsort.

And that is a good thing? How can the sequence be "random" if it
never contains the same two numbers in row? (Remember Enigma...)

···

Mauricio Fernandez

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Martin DeMello wrote:

http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code,
with a nice proof that it's unbiased.

http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on
shuffling algorithms.

Thanks Martin!

--Steve

repeats, and it's likely to be much sooner than you'd guess. For
more information, see the [Birthday Problem][1].

--Steve

[1]: Birthday Problem -- from Wolfram MathWorld

I liked the Birthday Problem. Thanks.

the big difference is that a value will have rand called multiple times for

That might be the very reason why it is ultimately much more biased than
sort_by{ rand }... I'm keeping the latter.
If the limited precision of rand() became a problem, I could always use
arr.sort_by{ [rand, rand] } where collisions happen with a probability
around 6.16298e-19 for a 10_000_000 element array, or more generally
Array.new(N){rand} (but then we'd have to look into MT for another
possible source of residual bias).

it - in otherwords an element compareds differently every time. this is
bound to help shuffle more uniformly it seems:

We cannot trust our intuition in these cases; in the following example, the
tenth element is more likely to stay where it was (look at pos[9]):

a = (1..100); pos = Array.new(100, 0)

t = Time.new; 100000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1 }
Time.new - t
# => 223.891608

pos
# => [1176, 1208, 1083, 1153, 1191, 1130, 1098, 1133, 1189, 1438, 1150, 1043,

···

On Tue, Dec 06, 2005 at 08:23:52AM +0900, ara.t.howard@noaa.gov wrote:
                                                          ========
# 1121, 1067, 1076, 1069, 1078, 1113, 1129, 1143,1115, 1074, 1081, 1069, 1057,
# [...]

t = Time.new; 10000000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1 }
Time.new - t
# => 22781.664168

pos
# => [105944, 105861, 105546, 105681, 104059, 101196, 99080, 100675, 110851,
# 132223, 101670, 96018, 96147, 96688, 97709, 99051, 97823, 97345, 98799,

# [...]

The bias can be observed easily with smaller arrays:

a = (1..20); pos = Array.new(20, 0)
t = Time.new
100000.times{ pos[a.sort{rand <=> rand}.index(3)] += 1 }
Time.new - t # => 20.848851
pos
# => [5084, 5694, 6561, 4326, 4205, 4792, 4913, 4827, 4791, 4571, 4560, 4707,
# 4863, 4934, 5112, 5011, 5071, 5326, 5482, 5170]

a = (1..10); pos = Array.new(10, 0)
t = Time.new
1_000_000.times{ pos[a.sort{rand <=> rand}.index(5)] += 1 }
Time.new - t # => 77.438912
pos
# => [113836, 103238, 99442, 88682, 101306, 83294, 93296, 100830, 100744, 115332]

Compare the latter to

a = (1..10); pos = Array.new(10, 0)
t = Time.new
1_000_000.times{ pos[a.sort_by{rand}.index(5)] += 1 }
Time.new - t # => 32.163499
pos
# => [99822, 100020, 100328, 100040, 99603, 100098, 100195, 100023, 99875, 99996]

--
Mauricio Fernandez

> More info, including the Ruby code I used, available at
> http://eigenclass.org/hiki.rb?sort_by+rand+is+biased
>
>> does
>>
>> %w( a b c d ).sort{ rand <=> rand }
>>
>> help?

Hi, I'm new to ruby, but I use Python for some years now.
So I suggest the following 'pythonic' solution :wink:

  def shuffle(a)
      
      b=
      # decorate
      a.each { |x| b << [rand,x]; }

      # sort
      b.sort! { |x,y| x[0] <=> y[0] }

      # undecorate
      b.collect { |x| x[1] }
  end

  a=%w{a b c d e f}
  puts shuffle a

Is there a shorter or more elegant way to do this
'decorate-sort-undecorate' in ruby ?
In Python it is quite easy by using list comprehensions...

Greetings, Uwe

Nah, I misread genrand_real; the above shows it is possible.

···

On Wed, Dec 07, 2005 at 12:41:39AM +0900, Christian Neukirchen wrote:

Mauricio Fernández <mfp@acm.org> writes:

> When called with no arguments, rand will use
>
> static double
> genrand_real(void)
> {
> unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6;
> return(a*67108864.0+b)*(1.0/9007199254740992.0);
> }
>
> so it would seem that two consecutive calls to rand() cannot return the
> same value, but I still have to think about the effect of qsort.

And that is a good thing? How can the sequence be "random" if it
never contains the same two numbers in row? (Remember Enigma...)

--
Mauricio Fernandez

Why not
a.dup.sort { 2*rand(12345) <=> 12345 }

? that way, we'd have NO colisions ever.

Mauricio Fernández wrote:

the big difference is that a value will have rand called multiple times for

That might be the very reason why it is ultimately much more biased than
sort_by{ rand }... I'm keeping the latter.

Indeed,

ar = [0, 1, 2]; result = Hash.new(0)
100000.times {result[ar.sort {rand <=> rand}] += 1}
p result
# => {[2, 1, 0]=>25235,
       [2, 0, 1]=>12285,
       [1, 0, 2]=>12426,
       [0, 1, 2]=>25111,
       [1, 2, 0]=>12468,
       [0, 2, 1]=>12475}

Michael

···

On Tue, Dec 06, 2005 at 08:23:52AM +0900, ara.t.howard@noaa.gov wrote:

--
Michael Ulm
R&D Team
ISIS Information Systems Austria
tel: +43 2236 27551-219, fax: +43 2236 21081
e-mail: michael.ulm@isis-papyrus.com
Visit our Website: www.isis-papyrus.com

Hi, I'm new to ruby, but I use Python for some years now.
So I suggest the following 'pythonic' solution :wink:

[...]

Is there a shorter or more elegant way to do this
'decorate-sort-undecorate' in ruby ?

arr.sort_by{rand}

In Python it is quite easy by using list comprehensions...

What does that look like in Python?
Is it shorter than the above one? }:slight_smile:

···

On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe Schmitt wrote:

--
Mauricio Fernandez

What are you all trying to achieve by variations of
   rand <=> rand,

why not simply use
  rand(3) - 1
?

which is not in danger of decreasing randomness by combining random numbers.

cheers,

Brian

···

On 07/12/05, hmassa@gmail.com <hmassa@gmail.com> wrote:

Why not
a.dup.sort { 2*rand(12345) <=> 12345 }

? that way, we'd have NO colisions ever.

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

hmassa@gmail.com wrote:

Why not
a.dup.sort { 2*rand(12345) <=> 12345 }

? that way, we'd have NO colisions ever.

What you have posted does not avoid collisions.
You probably mean something like

a.sort {2*rand(12345) <=> 2 * rand(12345) + 1}

However, the problem of the sort {rand<=>rand}
methods are not collisions, but the skewed results
from the algorithm that assumes a consistent
comparison operator. Quoting from an earlier
post of mine:

ar = [0, 1, 2]; result = Hash.new(0)
100000.times {result[ar.sort {rand <=> rand}] += 1}
p result
# => {[2, 1, 0]=>25235,
       [2, 0, 1]=>12285,
       [1, 0, 2]=>12426,
       [0, 1, 2]=>25111,
       [1, 2, 0]=>12468,
       [0, 2, 1]=>12475}

You get basically the same result with your method

HTH,

Michael

···

--
Michael Ulm
R&D Team
ISIS Information Systems Austria
tel: +43 2236 27551-219, fax: +43 2236 21081
e-mail: michael.ulm@isis-papyrus.com
Visit our Website: www.isis-papyrus.com

Because it's the slowest version I've tested. See the rest of the thread for my benchmarks. In order of speed, we have 'shuffle3', 'sort_by {rand}', and 'shuffle4'.

--Steve

···

On Dec 7, 2005, at 6:57 AM, hmassa@gmail.com wrote:

Why not
a.dup.sort { 2*rand(12345) <=> 12345 }

> Hi, I'm new to ruby, but I use Python for some years now.
> So I suggest the following 'pythonic' solution :wink:
[...]
> Is there a shorter or more elegant way to do this
> 'decorate-sort-undecorate' in ruby ?

arr.sort_by{rand}

that is not equivalent to my solution.
the python equivalent to yours is

  arr.sort(key = lambda x: random())

> In Python it is quite easy by using list comprehensions...
What does that look like in Python?

from random import random

def shuffle(a):
    b = [ (random(), i) for i in a]
    b.sort()
    return [ x[1] for x in b ]

print shuffle(range(10))

Is it shorter than the above one? }:slight_smile:

--
Mauricio Fernandez

greetings, Uwe

···

On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe Schmitt wrote:

> Why not
> a.dup.sort { 2*rand(12345) <=> 12345 }
>
> ? that way, we'd have NO colisions ever.
>

What you have posted does not avoid collisions.
You probably mean something like

a.sort {2*rand(12345) <=> 2 * rand(12345) + 1}

[snip]

and then he should do us a favour and write

a.sort { [-1, 1][rand(2)] }

instead, which gives the same skewed results but at least does not do
it as complicated.

Brian

···

On 07/12/05, Michael Ulm <michael.ulm@isis-papyrus.com> wrote:

hmassa@gmail.com wrote:

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

Uwe Schmitt wrote:

>> arr.sort_by{rand}

that is not equivalent to my solution.
the python equivalent to yours is

  arr.sort(key = lambda x: random())

>> >> > In Python it is quite easy by using list comprehensions...
>> What does that look like in Python?

from random import random

def shuffle(a):
    b = [ (random(), i) for i in a]
    b.sort()
    return [ x[1] for x in b ]

print shuffle(range(10))

I would have thought those two functions do the same thing
with the exception that shuffle does not sort in place.
I'm probably missing something obvious. Could you explain
the difference between the two versions?

Michael

···

--
Michael Ulm
R&D Team
ISIS Information Systems Austria
tel: +43 2236 27551-219, fax: +43 2236 21081
e-mail: michael.ulm@isis-papyrus.com
Visit our Website: www.isis-papyrus.com

Hi,

>> >> > Is there a shorter or more elegant way to do this >> > 'decorate-sort-undecorate' in ruby ?
>> >> arr.sort_by{rand}

that is not equivalent to my solution.
the python equivalent to yours is

  arr.sort(key = lambda x: random())

No, Array#sort_by does the 'decorate-sort-undecorate', aka.
Schwartzian Transform.

I don't read Python very well, but I think the ruby equivalent
to 'arr.sort(key = lambda x: random())' might be:

  arr.sort { rand <=> rand }

But that is different from arr.sort_by {rand}

Regards,

Bill

···

From: "Uwe Schmitt" <schmitt@num.uni-sb.de>

>> On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe Schmitt wrote:

> >> arr.sort_by{rand}
> that is not equivalent to my solution.
> the python equivalent to yours is
> arr.sort(key = lambda x: random())

> >> > In Python it is quite easy by using list comprehensions...
> >> What does that look like in Python?
>
> from random import random
>
> def shuffle(a):
> b = [ (random(), i) for i in a]
> b.sort()
> return [ x[1] for x in b ]
>
> print shuffle(range(10))

I would have thought those two functions do the same thing
with the exception that shuffle does not sort in place.
I'm probably missing something obvious. Could you explain
the difference between the two versions?

If you use random as sorting key, the result of a<=>b
will differ each time you evaluate a<=>b.
In my case I attach a random number to each item,
then I sort my list according to this number and remove
this 'decoration' afterwards.

Pythonprogrammers call this pattern "decorate-sort-undecorate".

As I do not now anything about the ruby interanls of sort,
I prefer my version.

Greetings, Uwe.

If you use random as sorting key, the result of a<=>b
will differ each time you evaluate a<=>b.
In my case I attach a random number to each item,
then I sort my list according to this number and remove
this 'decoration' afterwards.

run

  ri Enum#sort_by

and see what it do.

Guy Decoux

Uwe Schmitt ha scritto:

>> > >> arr.sort_by{rand}
>> > that is not equivalent to my solution.
>> > the python equivalent to yours is
>> > arr.sort(key = lambda x: random())

>> > >> > In Python it is quite easy by using list comprehensions...
>> > >> What does that look like in Python? >> > >> > from random import random
>> > >> > def shuffle(a):
>> > b = [ (random(), i) for i in a]
>> > b.sort()
>> > return [ x[1] for x in b ]
>> > >> > print shuffle(range(10))

>> I would have thought those two functions do the same thing
>> with the exception that shuffle does not sort in place.
>> I'm probably missing something obvious. Could you explain
>> the difference between the two versions?

If you use random as sorting key, the result of a<=>b
will differ each time you evaluate a<=>b. In my case I attach a random number to each item,
then I sort my list according to this number and remove
this 'decoration' afterwards.

Pythonprogrammers call this pattern "decorate-sort-undecorate".

Are you sure of what you say? AFAIK
sort(key=lambda x: somefunc..))

will do decorate/sort/undecorate under the hood, behaving like ruby's #sort_by and like your version.
OTOH
sort(cmp=lambda x,y: something.. ) will behave like ruby's #sort.