Weighted random selection -- how would you do this?

Here’s a little question for you.

Given: A set of items, each with
an associated integer representing
the relative likelihood that each
item will be selected.

Write: A method that will select
a random item.

I’ve started this a couple of times,
but have become convinced I’m making
it too hard, and someone can probably
come up with a three-liner.

Any ideas?

Cheers,
Hal

Say that you have a hash ‘events’ whose keys are the events you are
interested in and the values are the corresponding weights:

e.g.
events = {
“event1” => 5,
“event2” => 17,
“event3” => 6,
“event4” => 13
}

Solution:
tot = 0; events.each { |event, weight| tot += weight }
val = rand tot
event = events.each do |event, weight|
val -= weight;
return event if val < 0
end

This should work, I think.

···

On Sat, Mar 29, 2003 at 03:11:58PM +0900, Hal E. Fulton wrote:

Here’s a little question for you.

Given: A set of items, each with
an associated integer representing
the relative likelihood that each
item will be selected.

Write: A method that will select
a random item.

I’ve started this a couple of times,
but have become convinced I’m making
it too hard, and someone can probably
come up with a three-liner.

Any ideas?

Cheers,
Hal


Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

distrain: distrain (di-STRAYN) verb tr., intr.
To seize the property in order to force payment for damages, debt, etc.

In article 03fc01c2f5ba$a04cb8e0$0300a8c0@austin.rr.com,

Here’s a little question for you.

Given: A set of items, each with
an associated integer representing
the relative likelihood that each
item will be selected.

Write: A method that will select
a random item.

I’ve started this a couple of times,
but have become convinced I’m making
it too hard, and someone can probably
come up with a three-liner.

Any ideas?

You’re talking about some kind of proportional selection scheme (hey, you
wouldn’t happen to be working on Genetic Algorithms, would you?). One way
would be to do what’s called roulette wheel selection. You can visualize
a roulette wheel where each number is sized proportionally to the weight.
The bigger the weight, the more likely the ‘roulette wheel’ is to land on
that item.

It would look something like:

#Warning: untested code!
class RouletteWheel
def initialize(itemList)
@rouletteList =
runningTotal = 0
@items = itemList
calc_sum
@items.each_with_index{ |w,i|
prob = calc_prob(w)
runningTotal += prob
@rouletteList[i] = runningTotal
}
end

def spin
rn = rand
prev = 0.0
@items.each_with_index { |w,i|
@rouletteList.each_with_index { |p,j|
if rn.between?(prev,p)
return @items[j]
end
prev = p
}
}
#otherwise return the last one:
return @items[-1]
end

private
def calc_sum
@items.each { |w|
@sum += w
}
end

def calc_prob(value)
value/@sum.to_f
end

end

#use it:

rw = RouletteWheel([10,20,3,42,121,5])
item = rw.spin #most likely to get 121, least likely to get 3

Or alternatively, you could have spin return the index of the item in the
array.

Now of course you’d probably modify it so that the itemList passed in
isn’t just a list of integers, but objects which contain an integer
weighting…

Phil

···

Hal E. Fulton hal9000@hypermetrics.com wrote:

“Hal E. Fulton” hal9000@hypermetrics.com wrote in message news:03fc01c2f5ba$a04cb8e0$0300a8c0@austin.rr.com

Given: A set of items, each with
an associated integer representing
the relative likelihood that each
item will be selected.

Write: A method that will select
a random item.

It’s not exactly what you want (but there are good reasons for it
being different), but here’s the code I’ve been using for ages:

module WeightedRand

def select
p = rand
each do |prob, value|
return value if p < prob
p -= prob
end
nil
end

end

a = [[0.5, :heads], [0.5, :tails]].extend(WeightedRand)
5.times { puts(a.select) }

The list of items is an Array of [Probability, Value] pairs. This
means that you don’t have to compute the sum (i.e. normalise the
probabilities) each time you select something (good if you need to do
lots of selects).

Here’s some more code:

module WeightedRand

Normalise probabilities

def normalize!
sum = 0.0
each do |prob, value|
sum += prob
end
unless sum.zero?
collect! do |prob, value|
[prob / sum, value]
end
end
end

Equal probabilities

def uniform!
p = 1.0 / size
collect! do |prob, value|
[p, value]
end
end

end

Convert a ‘events’ hash as described by the original poster to an

Array prob/value pairs

def futon_hash_to_wr(hash)
hash.collect do |value, freq|
[freq, value]
end.extend(WeightedRand).normalize!
end

Regards,

Tom

def random_weighted(items)
total = 0
pick = nil
items.each do |item, weight|
pick = item if rand(total += weight) < weight
end
pick
end

···

“Hal E. Fulton” hal9000@hypermetrics.com wrote:

Given: A set of items, each with
an associated integer representing
the relative likelihood that each
item will be selected.

Write: A method that will select
a random item.


Tabby

Sir Hal,

I may be late by the time this gets posted. If so, I apologize…

(BTW: Thanks for The Ruby Way too… a great book).

My thoughts on this were that a frequency integer represents an interval
/ range into an expanded set of the given elements repeated given
element frequency times.

Eg. Given a composed array of values and weights

set = [ [“A”,2], [“B”,1] , [“C”,6] ]

C is 6 times more likely than B

C is 3 times as likely as A

The full population frequency array would be

expanded_set = set.collect{|e| [ e[0] ] * e[1] }.flatten

[ “A”, “A”, “B”, “C”, “C”, “C”, “C”, “C”, “C” ]

weighted_random_element = expanded_set[ rand( expanded_set.length() ) ]

“C”

{I checked this by histogram over 10000 trials and the frequencies were
converging with expected values}

But it might be pretty expensive (in space, at least) for large elements
and weights, especially if the weights can’t be scaled by their greatest
common factor.

Eg. For a random number between 0 and the sum of all weights, count
along the sets until the number minus the sum of all processed element
weights is zero. Return the set element on which this occurs.

But without expansion (at least of the indexes into the original array),
how do you map backwards from the frequency? Any ideas?

Hope this helps,

Lachlan Pitts

···

-----Original Message-----
From: Hal E. Fulton [mailto:hal9000@hypermetrics.com]
Sent: Saturday, 29 March 2003 4:12 PM
To: ruby-talk ML
Subject: Weighted random selection – how would you do this?

Here’s a little question for you.

Given: A set of items, each with
an associated integer representing
the relative likelihood that each
item will be selected.

Write: A method that will select
a random item.

I’ve started this a couple of times,
but have become convinced I’m making
it too hard, and someone can probably
come up with a three-liner.

Any ideas?

Cheers,
Hal

In article 20030329063108.GA2300@math.umd.edu,

···

Daniel Carrera dcarrera@math.umd.edu wrote:

Say that you have a hash ‘events’ whose keys are the events you are
interested in and the values are the corresponding weights:

e.g.
events = {
“event1” => 5,
“event2” => 17,
“event3” => 6,
“event4” => 13
}

Solution:
tot = 0; events.each { |event, weight| tot += weight }
val = rand tot
event = events.each do |event, weight|
val -= weight;
return event if val < 0
end

This should work, I think.

Hmmm… Wouldn’t this be biased by order?

Phil

You’re talking about some kind of proportional selection scheme (hey, you
wouldn’t happen to be working on Genetic Algorithms, would you?). One way
would be to do what’s called roulette wheel selection. You can visualize
a roulette wheel where each number is sized proportionally to the weight.
The bigger the weight, the more likely the ‘roulette wheel’ is to land on
that item.

Wonderful for visualization!

Make spin() use binary search and it will be faster (mostly useful when
spin() is called multiple times on the same list of weights, of course).

···

Kero

class RouletteWheel
def initialize(itemList)
@rouletteList =
runningTotal = 0
@items = itemList
calc_sum
@items.each_with_index{ |w,i|
prob = calc_prob(w)
runningTotal += prob
@rouletteList[i] = runningTotal
}
end

def spin
rn = rand
prev = 0.0
@items.each_with_index { |w,i|
@rouletteList.each_with_index { |p,j|
if rn.between?(prev,p)
return @items[j]
end
prev = p
}
}
#otherwise return the last one:
return @items[-1]
end

private
def calc_sum
@sum = 0
@items.each { |w|
@sum += w
}
end

def calc_prob(value)
value/@sum.to_f
end

end

Does indeed, thank you. The algorithm is
a little non-obvious to me, but it is
simpler than mine.

Hal

···

----- Original Message -----
From: “Daniel Carrera” dcarrera@math.umd.edu
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Saturday, March 29, 2003 12:31 AM
Subject: Re: Weighted random selection – how would you do this?

Say that you have a hash ‘events’ whose keys are the events you are
interested in and the values are the corresponding weights:

e.g.
events = {
“event1” => 5,
“event2” => 17,
“event3” => 6,
“event4” => 13
}

Solution:
tot = 0; events.each { |event, weight| tot += weight }
val = rand tot
event = events.each do |event, weight|
val -= weight;
return event if val < 0
end

This should work, I think.

You’re talking about some kind of proportional selection scheme (hey, you
wouldn’t happen to be working on Genetic Algorithms, would you?).

Not at the moment, but I’ve certainly had that in the
back of my mind for some time. Are you interested in
GAs? I am, but I’m not knowledgeable yet.

One way
would be to do what’s called roulette wheel selection. You can visualize
a roulette wheel where each number is sized proportionally to the weight.
The bigger the weight, the more likely the ‘roulette wheel’ is to land on
that item.

It would look something like:

[snip]

Fascinating and clever. But I’ll probably
go with a lighter-weight solution.

Cheers,
Hal

···

----- Original Message -----
From: “Phil Tomson” ptkwt@shell1.aracnet.com
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Saturday, March 29, 2003 1:44 AM
Subject: Re: Weighted random selection – how would you do this?

It’s not exactly what you want (but there are good reasons for it
being different), but here’s the code I’ve been using for ages:

module WeightedRand

def select
p = rand
each do |prob, value|
return value if p < prob
p -= prob
end
nil
end

end

a = [[0.5, :heads], [0.5, :tails]].extend(WeightedRand)
5.times { puts(a.select) }

Again, interesting. But I keep thinking about Phil’s
comment about order.

Is there a consequence to the fact that you are
iterating through a list in order? Might the
results be “clustered” in some way?

I’m not much on probability and stats; and I’ve
been up a long time. :slight_smile:

Hal

···

----- Original Message -----
From: “Tom Payne” google@tompayne.org
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Saturday, March 29, 2003 4:05 AM
Subject: Re: Weighted random selection – how would you do this?

Would it?
This is my thinking, perhaps you can help me see if I’m wrong:

Example:
Take two events with weights:
event1 => 3
event2 => 7

Pick a random number from 0 to 3+7=10
There is a 30% chance that the number will be between 0 and 3 and there is
a 70% chance that the number will be between 3 and 10.

By selecting an event based on which range the random number lies in I
should get the biased probability that I desire.

I think I should replace the “val < 0” in my code by “val <= 0”.
Is this the error you were referring to? Or is there a problem with my
reasoning?

···

On Sat, Mar 29, 2003 at 05:04:32PM +0900, Phil Tomson wrote:

event = events.each do |event, weight|
val -= weight;
return event if val < 0
end

Hmmm… Wouldn’t this be biased by order?


Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

distrain: distrain (di-STRAYN) verb tr., intr.
To seize the property in order to force payment for damages, debt, etc.

This should work, I think.

Hmmm… Wouldn’t this be biased by order?

Elaborate, Phil… do you mean that the actual
distribution would be wrong? Or just that there
would be a pattern in the sequence of choices?

Hal

···

----- Original Message -----
From: “Phil Tomson” ptkwt@shell1.aracnet.com
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Saturday, March 29, 2003 2:04 AM
Subject: Re: Weighted random selection – how would you do this?

Again, interesting. But I keep thinking about Phil’s
comment about order.

Is there a consequence to the fact that you are
iterating through a list in order? Might the
results be “clustered” in some way?

···

----- Original Message -----
From: “Hal E. Fulton” hal9000@hypermetrics.com


Ordering makes no difference (except possibly in how the errors will show up
if the boundary cases are wrong, like the <' vs<=’ bug referenced
earlier). Those boundary cases are a pain…

Looking at Daniel’s code again (and I would suggest trusting a
mathematician on problems like this :), but with a simpler set of weights to
help the boundary conditions be more clear (and rewriting things a bit):

def wrand weightArray
total = 0
weightArray.each {|object,weight| total += weight}
val = rand total
weightArray.each do |object,weight|
val -= weight
return object if val < 0
end
end

10.times {puts wrand([[:never,0],[:always,1]])}

#> always
#> always
#> always
#> always
#> always
#> always
#> always
#> always
#> always
#> always

Clearly, <=' is incorrect;total’ would be 1, val' would always be 0, and:never’ would always be selected.

This seems like a pretty straight-forward algorithm to me; why would you
think ordering has anything to do with it? Especially since Daniel’s hash
(like most hashes :slight_smile: is not ordered in any particular way. The `each’es in
his code could be done in different orders… which is in fact a bug! It
wouldn’t happen very often, but it could happen.

The order of the object-weight pairs is irrelevant to the correctness of the
algorithm, but the two passes must use the same order.

Tabby’s algorithm avoids the whole issue by making just one pass. It uses
many rands, but rand is cheap. Very nice! It reminds me of the shuffling
algorithm they can’t seem to get right at
http://c2.com/cgi/wiki?LinearShuffle (well, they did eventually get it
right). Did you come up with that, Tabby, or did you learn it somewhere?
It’s lovely.

Chris

“Hal E. Fulton” hal9000@hypermetrics.com schrieb im Newsbeitrag
news:049601c2f5e5$d01a5de0$0300a8c0@austin.rr.com

Is there a consequence to the fact that you are
iterating through a list in order? Might the
results be “clustered” in some way?

Not if the rand function distributes numbers equally over the interval
(which it should). What matters is the probability (or weight) of a choice.

robert

“Hal E. Fulton” hal9000@hypermetrics.com wrote in message news:049601c2f5e5$d01a5de0$0300a8c0@austin.rr.com

Is there a consequence to the fact that you are
iterating through a list in order? Might the
results be “clustered” in some way?

No, it doesn’t matter.

It would matter if the numbers produced by Kernel.rand were not
uniformly distributed in [0,1), but they are (for all realistic
definitions of ‘are’) so it doesn’t matter.

The only way earlier objects could be favoured is for Kernel.rand to
produce smaller numbers more often than big ones, which it doesn’t do.

Regards,

Tom

In article 048001c2f5e4$929b6000$0300a8c0@austin.rr.com,

From: “Phil Tomson” ptkwt@shell1.aracnet.com
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Saturday, March 29, 2003 1:44 AM
Subject: Re: Weighted random selection – how would you do this?

You’re talking about some kind of proportional selection scheme (hey, you
wouldn’t happen to be working on Genetic Algorithms, would you?).

Not at the moment, but I’ve certainly had that in the
back of my mind for some time. Are you interested in
GAs? I am, but I’m not knowledgeable yet.

I just finished up a GA class. It was one of the most interesting classes
I’ve taken.

One way
would be to do what’s called roulette wheel selection. You can visualize
a roulette wheel where each number is sized proportionally to the weight.
The bigger the weight, the more likely the ‘roulette wheel’ is to land on
that item.

It would look something like:

[snip]

Fascinating and clever. But I’ll probably
go with a lighter-weight solution.

Yes, the lighter-weight solution definately is faster, but how it works is
not obvious to me either.

Phil

···

Hal E. Fulton hal9000@hypermetrics.com wrote:

----- Original Message -----

Should be right w/ val <= 0, as the underlying distribution function is
defined as F(x) = p( X <= x ), or maybe not :slight_smile:

Just for fun, the mandatory inefficient one liner:
batsman@tux-chan:/tmp$ expand -t2 y.rb
events = {
“event1” => 5,
“event2” => 17,
“event3” => 6,
“event4” => 13
}

p((a = events.to_a.map{ |e,i| [e] * i }.flatten)[rand(a.size)])
batsman@tux-chan:/tmp$ ruby y.rb
“event2”

···

On Sat, Mar 29, 2003 at 05:14:58PM +0900, Daniel Carrera wrote:

On Sat, Mar 29, 2003 at 05:04:32PM +0900, Phil Tomson wrote:

event = events.each do |event, weight|
val -= weight;
return event if val < 0
end

Hmmm… Wouldn’t this be biased by order?

Would it?
This is my thinking, perhaps you can help me see if I’m wrong:

Example:
Take two events with weights:
event1 => 3
event2 => 7

Pick a random number from 0 to 3+7=10
There is a 30% chance that the number will be between 0 and 3 and there is
a 70% chance that the number will be between 3 and 10.

By selecting an event based on which range the random number lies in I
should get the biased probability that I desire.

I think I should replace the “val < 0” in my code by “val <= 0”.
Is this the error you were referring to? Or is there a problem with my
reasoning?


_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |
__ \ | | | | | (| | | | |
.__/ _,
|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Why are there always boycotts? Shouldn’t there be girlcotts too?
– argon on #Linux

I think you were right the first time…
running it 40K times, I find that the
statistics are right for the <0 code,
not <=0. (The latter took hits away from
the last choice and gave them to the
first choice.)

Thanks,
Hal

···

----- Original Message -----
From: “Daniel Carrera” dcarrera@math.umd.edu
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Saturday, March 29, 2003 2:14 AM
Subject: Re: Weighted random selection – how would you do this?

Would it?
This is my thinking, perhaps you can help me see if I’m wrong:

Example:
Take two events with weights:
event1 => 3
event2 => 7

Pick a random number from 0 to 3+7=10
There is a 30% chance that the number will be between 0 and 3 and there is
a 70% chance that the number will be between 3 and 10.

By selecting an event based on which range the random number lies in I
should get the biased probability that I desire.

I think I should replace the “val < 0” in my code by “val <= 0”.
Is this the error you were referring to? Or is there a problem with my
reasoning?

In article 048601c2f5e4$b88d1ec0$0300a8c0@austin.rr.com,

From: “Phil Tomson” ptkwt@shell1.aracnet.com
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Saturday, March 29, 2003 2:04 AM
Subject: Re: Weighted random selection – how would you do this?

This should work, I think.

Hmmm… Wouldn’t this be biased by order?

Elaborate, Phil… do you mean that the actual
distribution would be wrong? Or just that there
would be a pattern in the sequence of choices?

I tried running it a few times and well it seems to work.

I was just thinking that the order of the elements in the list could
influence which is picked (for example if elements with high weights
appeared earlier or later in the list), but it doesn’t seem to be the
case. Certainly a lot more compact and faster than my RouletteWheel
example.

Phil

···

Hal E. Fulton hal9000@hypermetrics.com wrote:

----- Original Message -----