Roulette Wheel Selection

by Justin Skycak on

How to sample from a discrete probability distribution.

This post is a chapter in the book Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. Suggested citation: Skycak, J. (2021). Roulette Wheel Selection. Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. https://justinmath.com/roulette-wheel-selection/


As we saw previously, it’s easy to simulate a random coin flip by generating a random decimal $r$ between $0$ and $1$ and applying the following function:

$\begin{align*} \textrm{outcome}(r) = \begin{cases} \textrm{heads} & \textrm{ if } r < 0.5 \\ \textrm{tails} & \textrm{ otherwise} \end{cases} \end{align*}$


This is a special case of a more general idea: sampling from a discrete probability distribution. Flipping a fair coin is tantamount to sampling from the distribution [0.5, 0.5], i.e. $0.5$ probability heads and $0.5$ probability tails.

More complicated contexts may require sampling from longer distributions that may or may not be uniform. For example, if we wish to simulate the outcome of rolling a die with two red faces, one blue face, one green face, and one yellow face, then we need to sample from the distribution [0.4, 0.2, 0.2, 0.2].

Note that when we sample from the distribution, we need only sample an index from the distribution, and then use the index to look up the desired value. For example, when we sample an index from the distribution [0.4, 0.2, 0.2, 0.2], we have probabilities $0.4,$ $0.2,$ $0.2,$ and $0.2$ of getting indices $0,$ $1,$ $2,$ and $3$ respectively. Then, all we need to do is look up that index in the following array: [red, blue, green, yellow].

Roulette Wheel Selection

Roulette wheel selection is an elegant technique for sampling an index from an arbitrary discrete probability distribution. It works as follows:

  1. turn the distribution into a cumulative distribution,
  2. sample a random number $r$ between $0$ and $1,$ and then
  3. find the index of the first value in the cumulative distribution that is greater than or equal to $r.$

To illustrate, let’s sample an index from the distribution [0.4, 0.2, 0.2, 0.2] that was mentioned above in the context of a die roll.

  1. First, we construct the cumulative distribution: [0.4, 0.4 + 0.2, 0.4 + 0.2 + 0.2, 0.4 + 0.2 + 0.2 + 0.2], or more simply, [0.4, 0.6, 0.8, 1.0].
  2. Then, we sample a random number $r$ between $0$ and $1.$ Suppose we get $r = 0.63.$
  3. Finally, we find the index of the first value in the cumulative distribution that is greater than or equal to $r = 0.63.$ In our case, this is the value $0.8$ at index $2,$ because the next value ($1.0$ at index $3$) is greater than $r = 0.63.$

So, we have sampled the index $2.$

Exercise

Write a function random_draw(distribution) that samples a random number such that distribution[i] is the probability of sampling index i.

To test your function on a particular distribution, sample many indices from the distribution and ensure that the proportion of times each index gets sampled matches the corresponding probability in the distribution. Do this for a handful of different distributions.


This post is a chapter in the book Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. Suggested citation: Skycak, J. (2021). Roulette Wheel Selection. Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. https://justinmath.com/roulette-wheel-selection/