[ANN] rand.rb 0.9: Random access methods for Enumerables

Hello all, here's a little convenience library we whipped up a couple weeks back with Christian Neukirchen. It's very simple, but we hope someone might find it useful :slight_smile:

···

---

rand.rb is a library for picking random elements and shuffling.
This work is licensed under the same terms as Ruby itself.

Overview

rand.rb adds new methods to Enumerable, Array, and Hash to:

     * return a random element (pick, pick_index, pick_key, pick_value and their destructive versions suffixed with !).
     * arrange elements in new, random order (shuffle, shuffle_hash_pairs, shuffle_hash).
     * use above methods in convenient ways (each_random, map_random).

It also provides these new facilities to String:

     * shuffle_chars, to arrange the characters of the string in new order.
     * pick_byte, pick_char and pick_index, to return random bytes, characters or elements.

Examples:

arr = (1..10).to_a
arr.pick # => 3
arr.pick # => 7
arr.shuffle # => [3,5,2,10,8,7,6,1,9,4]

str = "string strung strong rang"
str.pick_char # => 'n'
str.pick_byte # => 97
str.shuffle_chars # => "srsntnignrgrgttsuan r og"

hash = {'a'=>2, 'b'=>3, 'c'=>4}
hash.pick! #=> ['c', 4]
hash #=> {'a'=>2, 'b'=>3}
hash.pick_key #=> 'b'
hash.pick_value #=> 2

File.open("foo"){|f|
   f.each_random{|line| puts line } # about equal to readlines.shuffle.each{...}
}

History

     * November 26, 2004: Initial version done as IRC collaboration project between kig and chris2.
     * November 29, 2004: First public release 0.9.

Getting it

rand.rb is packaged in RPA.
rpa install rand

You can download rand.rb at:
http://kronavita.de/chris/data/rand-0.9.0.tar.gz

Upstream source:
You can get latest development releases using darcs by executing:

darcs get http://kronavita.de/chris/data/rand

darcs send is the preferred way to send patches, but mailing diffs is fine too.

Authors

     * Ilmari Heikkinen: Code and unit tests
     * Christian Neukirchen: Documentation and housekeeping.

Please mail bugs, feature requests or patches to the mail addresses above or use IRC to contact the developers.

Copyright

Copyright 2004 Ilmari Heikkinen < kig misfiring net >
Parts Copyright 2004 Christian Neukirchen < chneukirchen gmail com >

P.S. I guess it's bad mojo to start versions from 0.9, but it _is_ very simple

Hi Ilmari,
its a bit OT, but I'd like to know, how you test the methods above.
How could you know, there is as much randomness as you want?

regards
ralf

···

Am Mittwoch, 15. Dezember 2004 01:24 schrieb Ilmari Heikkinen:

Hello all, here's a little convenience library we whipped up a couple
weeks back with Christian Neukirchen. It's very simple, but we hope
someone might find it useful :slight_smile:
---

rand.rb is a library for picking random elements and shuffling.
This work is licensed under the same terms as Ruby itself.

Overview

rand.rb adds new methods to Enumerable, Array, and Hash to:

     * return a random element (pick, pick_index, pick_key, pick_value
and their destructive versions suffixed with !).
     * arrange elements in new, random order (shuffle,
shuffle_hash_pairs, shuffle_hash).
     * use above methods in convenient ways (each_random, map_random).

It also provides these new facilities to String:

     * shuffle_chars, to arrange the characters of the string in new
order.
     * pick_byte, pick_char and pick_index, to return random bytes,
characters or elements.

Examples:

arr = (1..10).to_a
arr.pick # => 3
arr.pick # => 7
arr.shuffle # => [3,5,2,10,8,7,6,1,9,4]

str = "string strung strong rang"
str.pick_char # => 'n'
str.pick_byte # => 97
str.shuffle_chars # => "srsntnignrgrgttsuan r og"

hash = {'a'=>2, 'b'=>3, 'c'=>4}
hash.pick! #=> ['c', 4]
hash #=> {'a'=>2, 'b'=>3}
hash.pick_key #=> 'b'
hash.pick_value #=> 2

File.open("foo"){|f|
   f.each_random{|line| puts line } # about equal to
readlines.shuffle.each{...}
}

History

     * November 26, 2004: Initial version done as IRC collaboration
project between kig and chris2.
     * November 29, 2004: First public release 0.9.

Getting it

rand.rb is packaged in RPA.
rpa install rand

You can download rand.rb at:
http://kronavita.de/chris/data/rand-0.9.0.tar.gz

Upstream source:
You can get latest development releases using darcs by executing:

darcs get http://kronavita.de/chris/data/rand

darcs send is the preferred way to send patches, but mailing diffs is
fine too.

Authors

     * Ilmari Heikkinen: Code and unit tests
     * Christian Neukirchen: Documentation and housekeeping.

Please mail bugs, feature requests or patches to the mail addresses
above or use IRC to contact the developers.

Copyright

Copyright 2004 Ilmari Heikkinen < kig misfiring net >
Parts Copyright 2004 Christian Neukirchen < chneukirchen gmail com >

P.S. I guess it's bad mojo to start versions from 0.9, but it _is_ very
simple

umm, could you maybe replace:

   def shuffle!
     sort!{rand <=> 0.5}
   end

with:

   def shuffle!
     each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i], self[j]}
     self
   end

I would not rely on sort to do the right thing, and it is less efficient than necessary. I use the second shuffle code extensively in my code which deals with shuffling lots and lots of elements. Presumably the second is O(n) where the sort based scheme is at best O(n log n). A little googling suggests this is the "Fisher-Yates shuffle."

-neil

···

On Dec 14, 2004, at 7:24 PM, Ilmari Heikkinen wrote:

rand.rb is a library for picking random elements and shuffling.
This work is licensed under the same terms as Ruby itself.

I wonder. Do others see #pick in relation to #pop and #push as I do? Might
#pick be destructive instead of #pick! and using something like Galvin's
#rand for non-destructive?

Thoughts?
T.

···

On Tuesday 14 December 2004 07:24 pm, Ilmari Heikkinen wrote:

Hello all, here's a little convenience library we whipped up a couple
weeks back with Christian Neukirchen. It's very simple, but we hope
someone might find it useful :slight_smile:
---

rand.rb is a library for picking random elements and shuffling.
This work is licensed under the same terms as Ruby itself.

Overview

rand.rb adds new methods to Enumerable, Array, and Hash to:

     * return a random element (pick, pick_index, pick_key, pick_value
and their destructive versions suffixed with !).
     * arrange elements in new, random order (shuffle,
shuffle_hash_pairs, shuffle_hash).
     * use above methods in convenient ways (each_random, map_random).

It also provides these new facilities to String:

     * shuffle_chars, to arrange the characters of the string in new
order.
     * pick_byte, pick_char and pick_index, to return random bytes,
characters or elements.

Examples:

arr = (1..10).to_a
arr.pick # => 3
arr.pick # => 7
arr.shuffle # => [3,5,2,10,8,7,6,1,9,4]

str = "string strung strong rang"
str.pick_char # => 'n'
str.pick_byte # => 97
str.shuffle_chars # => "srsntnignrgrgttsuan r og"

hash = {'a'=>2, 'b'=>3, 'c'=>4}
hash.pick! #=> ['c', 4]
hash #=> {'a'=>2, 'b'=>3}
hash.pick_key #=> 'b'
hash.pick_value #=> 2

File.open("foo"){|f|
   f.each_random{|line| puts line } # about equal to
readlines.shuffle.each{...}
}

History

     * November 26, 2004: Initial version done as IRC collaboration
project between kig and chris2.
     * November 29, 2004: First public release 0.9.

Getting it

rand.rb is packaged in RPA.
rpa install rand

You can download rand.rb at:
http://kronavita.de/chris/data/rand-0.9.0.tar.gz

Upstream source:
You can get latest development releases using darcs by executing:

darcs get http://kronavita.de/chris/data/rand

darcs send is the preferred way to send patches, but mailing diffs is
fine too.

Authors

     * Ilmari Heikkinen: Code and unit tests
     * Christian Neukirchen: Documentation and housekeeping.

Please mail bugs, feature requests or patches to the mail addresses
above or use IRC to contact the developers.

Copyright

Copyright 2004 Ilmari Heikkinen < kig misfiring net >
Parts Copyright 2004 Christian Neukirchen < chneukirchen gmail com >

P.S. I guess it's bad mojo to start versions from 0.9, but it _is_ very
simple

--
( o _ カラチ
// trans.
/ \ transami@runbox.com

ruby -rdrb -e
'DRb.start_service;duck=DRbObject.new(nil,"druby://whytheluckystiff.net:6503");puts
duck.toms'

I don't give a damn for a man that can only spell a word one way.
-Mark Twain

[8,16,20,29,78,65,2,14,26,12,12,28,71,114,12,13,12,82,72,21,17,4,10,2,95].
each_with_index{|x,i| $><<(x^'Begin landing your troops'[i]).chr}
-Tadayoshi Funaba

When I've needed this, I've found it more useful to define
Array#pick(n=1) to allow for picking a set of several elements (e.g. a
scrabble rack, an hand of cards etc)

martin

···

Ilmari Heikkinen <kig@misfiring.net> wrote:

arr = (1..10).to_a
arr.pick # => 3

Hi,

Hi Ilmari,
its a bit OT, but I'd like to know, how you test the methods above.
How could you know, there is as much randomness as you want?

Short answer: I don't. Bad, yes.

Long answer:
The only probabilistic test is for #shuffle, which tries 10 times to get a shuffled array that != original array. The rest trust that Kernel#rand provides random numbers and compare the picked values to the original array to check that they all are from the original array.

I guess one way to test the randomness would be to collect an array from the picked values that's much larger than the original. Then check the distribution by checking that the entry counts normalized to array length (amt of item / array.length) are close enough to the original array's normalized counts. And check the randomness of the order by gzipping a long randomized string and checking that it compresses badly.

-Ilmari

···

On 15.12.2004, at 09:38, Ralf Müller wrote:

I don't know if the first proposed solution is valid, because it depends on how
sort is implemented and if the sort algorithm makes each comparision only once.
Otherwise I imagine weird things happening, if elements change their order
while sorting.

So the correct way to do this is

    sort_by{rand}

Fisher Yates is a nice, unbiased O(n) shuffler. I personally more often use the
sort shuffler, because it is so amazingly wordy and I can remember it.

Regards,

Brian

···

On Thu, 16 Dec 2004 02:05:42 +0900 Neil Spring <nspring@cs.washington.edu> wrote:

On Dec 14, 2004, at 7:24 PM, Ilmari Heikkinen wrote:
> rand.rb is a library for picking random elements and shuffling.
> This work is licensed under the same terms as Ruby itself.

umm, could you maybe replace:

   def shuffle!
     sort!{rand <=> 0.5}
   end

with:

   def shuffle!
     each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i],
self[j]}
     self
   end

I would not rely on sort to do the right thing, and it is less
efficient than necessary. I use the second shuffle code extensively in
my code which deals with shuffling lots and lots of elements.
Presumably the second is O(n) where the sort based scheme is at best
O(n log n). A little googling suggests this is the "Fisher-Yates
shuffle."

--
Brian Schröder
http://www.brian-schroeder.de/

Thank you, changing #shuffle! to that and #shuffle to dup.shuffle!

-Ilmari

···

On 15.12.2004, at 19:05, Neil Spring wrote:

  def shuffle!
    each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i], self[j]}
    self
  end

I would not rely on sort to do the right thing, and it is less efficient than necessary. I use the second shuffle code extensively in my code which deals with shuffling lots and lots of elements. Presumably the second is O(n) where the sort based scheme is at best O(n log n). A little googling suggests this is the "Fisher-Yates shuffle."

Neil Spring schreef:

rand.rb is a library for picking random elements and shuffling.
This work is licensed under the same terms as Ruby itself.

umm, could you maybe replace:

  def shuffle!
    sort!{rand <=> 0.5}
  end

with:

  def shuffle!
    each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i], self[j]}
    self
  end

I made this a function a few days ago and it seems to be a lot faster. Not sure if it's completely correct though.

def shuffle!
  a = 0
  while(a<length)
    b = (length*Kernel.rand).to_i
    tmp = self[a]
    self[a] = self[b]
    self[b] = tmp
    a+=1
  end
  self
end

···

On Dec 14, 2004, at 7:24 PM, Ilmari Heikkinen wrote:

How could you know, there is as much randomness as you want?

Here's a method for testing bias in array shuffle. I'd be interested in hearing about other approaches to probabilistic testing.. anyone?

module RandTests
   # RandTests.check_bias{|arr| arr.shuffle!}
   # => [true, {[2, 3, 1]=>8407, [2, 1, 3]=>8323, [3, 2, 1]=>8382,
   # [1, 2, 3]=>8442, [3, 1, 2]=>8255, [1, 3, 2]=>8191}]

···

#
   # Caveat!
   # Only works for (very) small arrays.
   # E.g. array of size 10 has 10! permutations = 3628800,
   # which means that you'd have to do around 10^9 to 10^10 tests
   # for the bias check to work properly and even then the permissible bias
   # would have to be large.
   #
   def self.check_bias(
     array_size=3,
     test_count=50000,
     permissible_bias=0.02,
     &block
   )
     total_permutation_count = (1..array_size).inject{|s, i| s*i} # factorial
     array = (1..array_size).to_a
     avg_permutation_count = test_count.to_f / total_permutation_count

     permutation_counts = {}
     (1..test_count).each{
       perm = block.call(array.dup)
       permutation_counts[perm] ||= 0
       permutation_counts[perm] += 1
     }
     test_result = permutation_counts.all?{|perm, count|
       count.between?(
         avg_permutation_count - permissible_bias*avg_permutation_count,
         avg_permutation_count + permissible_bias*avg_permutation_count)
     }
     [test_result, permutation_counts]
   end
end

-Ilmari

Well, lets see... a couple of equivalent transformations and:

  def shuffle!
    ln = length
    ln.times do |a|
      b = Kernel.rand(ln)
      self[a], self[b] = self[b], self[a]
    end
    self
  end

If you look closely, it is almost the very same thing. The real difference
lies in the the subtraction on the index being fed into #rand. I think this
is to improve the randomness a bit.

How much faster does it run?

T.

···

On Wednesday 15 December 2004 05:12 pm, Maarten Boonen wrote:

Neil Spring schreef:
> On Dec 14, 2004, at 7:24 PM, Ilmari Heikkinen wrote:
>> rand.rb is a library for picking random elements and shuffling.
>> This work is licensed under the same terms as Ruby itself.
>
> umm, could you maybe replace:
>
> def shuffle!
> sort!{rand <=> 0.5}
> end
>
> with:
>
> def shuffle!
> each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i],
> self[j]}
> self
> end

I made this a function a few days ago and it seems to be a lot faster.
Not sure if it's completely correct though.

def shuffle!
a = 0
while(a<length)
  b = (length*Kernel.rand).to_i
  tmp = self[a]
  self[a] = self[b]
  self[b] = tmp
  a+=1
end
self
end

Hi,

At Thu, 16 Dec 2004 07:12:15 +0900,
Maarten Boonen wrote in [ruby-talk:123730]:

I made this a function a few days ago and it seems to be a lot faster.
Not sure if it's completely correct though.

def shuffle!
  a = 0
  while(a<length)
    b = (length*Kernel.rand).to_i
    tmp = self[a]
    self[a] = self[b]
    self[b] = tmp
    a+=1
  end
  self
end

It wouldn't be uniform. An element which was close to the head
will tend to be placed close to the tail.

···

--
Nobu Nakada

Ilmari Heikkinen wrote:

How could you know, there is as much randomness as you want?

Here's a method for testing bias in array shuffle. I'd be interested in hearing about other approaches to probabilistic testing.. anyone?

How about examining the distributions of the displacement of each item from its original location for a large number of shuffles? The distribution should be relatively flat across all possible displacements.

···

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/&gt;

Ilmari Heikkinen wrote:

How could you know, there is as much randomness as you want?

Here's a method for testing bias in array shuffle. I'd be interested in hearing about other approaches to probabilistic testing.. anyone?

Here's an alternative that measures shuffle offsets and visibly shows the difference between two shuffle methods.

shuffle_test.rb (1.41 KB)

···

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/&gt;

trans. (T. Onoma) schreef:

> Neil Spring schreef:
> >> rand.rb is a library for picking random elements and shuffling.
> >> This work is licensed under the same terms as Ruby itself.
> >
> > umm, could you maybe replace:
> >
> > def shuffle!
> > sort!{rand <=> 0.5}
> > end
> >
> > with:
> >
> > def shuffle!
> > each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i],
> > self[j]}
> > self
> > end
>
> I made this a function a few days ago and it seems to be a lot faster.
> Not sure if it's completely correct though.
>
> def shuffle!
> a = 0
> while(a<length)
> b = (length*Kernel.rand).to_i
> tmp = self[a]
> self[a] = self[b]
> self[b] = tmp
> a+=1
> end
> self
> end

Well, lets see... a couple of equivalent transformations and:

  def shuffle!
    ln = length
    ln.times do |a|
      b = Kernel.rand(ln)
      self[a], self[b] = self[b], self[a]
    end
    self
  end

If you look closely, it is almost the very same thing. The real difference lies in the the subtraction on the index being fed into #rand. I think this is to improve the randomness a bit.

How much faster does it run?

T.

Just a very small test: (code below)

-> third is your equivalent code, self[a], self[b] = self[b], self[a] seems to slow it down quite a lot.

>ruby test3.rb
1.272
0.571
1.042
>Exit code: 0
>ruby test3.rb
1.262
0.601
1.042
>Exit code: 0
>ruby test3.rb
1.302
0.551
1.041
>Exit code: 0

class Array

   def shuffle!
     each_index {|j| i = rand(size-j); self[j], self[j+i] = self[j+i], self[j]}
     self
   end

def shuffle2!
  a = 0
  while(a<length)
    b = (length*Kernel.rand).to_i
    tmp = self[a]
    self[a] = self[b]
    self[b] = tmp
    a+=1
  end
  self
end

   def shuffle3!
     ln = length
     ln.times do |a|
       b = Kernel.rand(ln)
       self[a], self[b] = self[b], self[a]
     end
     self
   end

end

arr = (1..100000).to_a

a=Time.now
arr.shuffle!
puts Time.now-a

a=Time.now
arr.shuffle2!
puts Time.now-a

a=Time.now
arr.shuffle3!
puts Time.now-a

···

On Wednesday 15 December 2004 05:12 pm, Maarten Boonen wrote:
> > On Dec 14, 2004, at 7:24 PM, Ilmari Heikkinen wrote:

nobu.nokada@softhome.net schreef:

Hi,

At Thu, 16 Dec 2004 07:12:15 +0900,
Maarten Boonen wrote in [ruby-talk:123730]:

I made this a function a few days ago and it seems to be a lot faster. Not sure if it's completely correct though.

def shuffle!
a = 0
while(a<length)
  b = (length*Kernel.rand).to_i
  tmp = self[a]
  self[a] = self[b]
  self[b] = tmp
  a+=1
end
self
end

It wouldn't be uniform. An element which was close to the head
will tend to be placed close to the tail.

I tested this, by shuffeling an array [1..10] 100000 times and the elements at the head seem to be smaller (closer to the head again)?

>ruby test2.rb
512468
528318
540408
550876
559504
566140
567602
565901
559436
549347
>Exit code: 0
>ruby test2.rb
512626
527151
539146
551026
560751
565943
565926
566866
560561
550004
>Exit code: 0
>ruby test2.rb
511651
526778
541792
551417
559825
563856
567336
566272
561004
550069
>Exit code: 0

class Array

def shuffle!
  a = 0
  while(a<length)
    b = (length*Kernel.rand).to_i
    tmp = self[a]
    self[a] = self[b]
    self[b] = tmp
    a+=1
  end
  self
end

end

total = Array.new(10,0)
for i in 1..100000
  arr = (1..10).to_a
  arr.shuffle!
  total.each_index do |index| total[index] += arr[index] end
end

puts total

Hmm.. Interesting. Do you mind changing version 3 to use a tmp variable, like
yours, and see how they compare again?

T.

···

On Thursday 16 December 2004 03:27 am, Maarten Boonen wrote:

Just a very small test: (code below)

-> third is your equivalent code, self[a], self[b] = self[b], self[a]
seems to slow it down quite a lot.

>ruby test3.rb

1.272
0.571
1.042

>Exit code: 0
>ruby test3.rb

1.262
0.601
1.042

>Exit code: 0
>ruby test3.rb

1.302
0.551
1.041

>Exit code: 0

trans. (T. Onoma) schreef:

Hmm.. Interesting. Do you mind changing version 3 to use a tmp variable, like yours, and see how they compare again?

That's a lot faster, indeed!

>ruby test3.rb
1.262
0.611
0.33
>Exit code: 0
>ruby test3.rb
1.201
0.551
0.321
>Exit code: 0
>ruby test3.rb
1.202
0.561
0.32
>Exit code: 0

   def shuffle3!
     ln = length
     ln.times do |a|
       b = Kernel.rand(ln)
       tmp = self[a]
       self[a] = self[b]
       self[b] = tmp
       #self[a], self[b] = self[b], self[a]
     end
     self
   end