2011.11.16

Generating “non-biased” random integers

If you know Number Theory or had algorithms classes in college you can probably skip this post… - I studied Graphic Design so I had to learn those things by myself - I will try to explain it using as less math as possible.. I know there is a lot of info about this subject on the internet and algorithms books but today I spent a couple minutes searching for it and most of them are too hard to understand if you are not a mathematician (which BTW I’m not..) so after giving up on the hard reading I decided to come up with my own solution to the problem, it may not be most elegant solution but it is at least a logical solution and results are acceptable.

Doing it wrong

I already had a simple function to generate random floating point numbers inside a range, a simplified version of it looks like:

function rand(min, max) {
    return lerp(Math.random(), min, max);
}

function lerp(ratio, start, end) {
    return start + (end - start) * ratio;
}

The logic is pretty straightforward, Math.random() returns a random number from 0 to 1, the lerp() method calculates the “linear interpolation” between start and end, here is an example:

lerp(0.5, 0, 100);  // 50
lerp(0.2, 0, 10);   // 2
lerp(0.5, -10, 10); // 0

Since “lerp ratio” is also a value inside the range 0...1 it is just a matter of doing the linear interpolation to a random ratio, nothing wrong about this process… One may think that since the random float works properly it would be just fine to use Math.round() to get a random integer:

//biased! don't use this code!
function randInt(min, max) {
    return Math.round( rand(min, max) );
}

Consider this example:

randInt(-1, 1);

What is the probability of getting 0, 1 and -1 respectively? If you think it’s ~33% than you are wrong, 0 has 50% of probability since anything inside the range -0.5...0.5 is going to rounded to 0 - I already knew it before I coded the naive approach but at that time I didn’t needed any sort of precision - It is important to note that the problem only affects the min/max values no matter how large is the range.

Running this code 1000x:

// first run
-1 => 24.7%
 0 => 50.2%
 1 => 25.1%

// second run
-1 => 24.8%
 0 => 51.6%
 1 => 23.5%

Y U NO LIKE -1 and 1?

So how to do it “right”?

First thing I did was to think how I would approach the problem on the simplest way possible that wouldn’t require reading a technical paper about RNG which includes way more info than most people probably need to know…

Another option was reading Python source code (as suggested by Gabriel Laet on twitter) but my gut said the implementation was more complex than I needed so I decided to try to solve the problem myself without any external help (I like puzzles and this one didn’t looked that hard since we already have Math.random() in JavaScript).

For me the most important thing at solving this kind of problems is breaking it into smaller ones. The real issue with the previous approach is that the “range” that “covered” a single result (zero) was 50% of the full range, to solve the problem we need to break the range into 3 equal-sized steps (since the range has 3 integers) and pick a random “step”/”index” instead of simply rounding the linear interpolation.

“A problem well stated is a problem half solved” – Charles F. Kettering

Solution(s)

function randInt(min, max) {
    var stepSize = 1 / (max - min + 1),
        nSteps = Math.floor(Math.random() / stepSize);
    return min + nSteps;
}

The “step size” is calculated by dividing one by the amount of integers inside the range, so if range have 3 items it will be 1/3 === 0.3333333333333333 if range has 4 items it will be 1/4 === 0.25 and so on… (exactly what we need)

By knowing the size of the “step”, and since it is always inside the range 0 <= n <= 1, we can pick a random step by using Math.random().

If Math.random() is inside the range 0...0.3333333333333332 it should return 0 (zero steps), if it’s inside the range 0.3333333333333333...0.6666666666666665 it should return 1 (one step) and from 0.6666666666666666 to 1 it should return 2 (two steps). That is a very common math operation so I abstract it as a separate method to make clearer what the code is doing, I call this method countSteps() since most times I needed this function was to “count the number of full steps” and I haven’t seen it on any other library to follow the same convention (naming things is hard):

function countSteps(val, stepSize) {
    return Math.floor(val / stepSize);
}

So the code could be rewritten for better readability:

function randInt(min, max) {
    var stepSize = 1 / (max - min + 1);
    return min + countSteps(Math.random(), stepSize);
}

Running this code 1000x:

// first run
-1 => 31.1%
 0 => 34.7%
 1 => 34.1%

// second run
-1 => 36.1%
 0 => 33.4%
 1 => 30.4%

Not biased at all!

Solution #2 (added 2011/11/16)

After the post I got a pull request with another implementation that after a few tweaks got into this code:

function randInt(min, max) {
  var diff = max - min,
      rnd = Math.random();
  return rnd !== 1? Math.floor((diff + 1) * rnd) + min : min + diff;
}

The other solution came so fast that I didn’t even explored other options, I already had the method countSteps().. and you know,

“if you only have a hammer you tend to see every problem as a nail” – Abraham Maslow

This method does work properly because (1 -(-1) + 1) === 3 and Math.floor(Math.random() * 3) will return 0, 1 or 2 unless Math.random() === 1 which would return 3 making the method return a value greater than max.

Solution #3 (added 2011/11/16)

While I was updating the post with more info to explain the problem only happens with the max/min values I realized it could be fixed with a simpler solution while still using the regular random method but Fabio Caccamo was quicker to post it :D

function randInt(min, max) {
   // can't be `max + 0.5` otherwise it will round up if `rand`
   // returns `max` causing it to overflow range.
   return Math.round( rand(min - 0.5, max + 0.499999999999) );
}

It does work because the rounding issue only affect the min and max values and we are increasing the range so all the numbers have an equal chance… now I feel dumb for over-engineering a “simple” problem :P

Sum up

I still find the first solution to be more readable, probably because I’m very used with the countSteps method and it uses more vars/methods that ends up describing what the method does… (or maybe just because it is closer to the way my head works) But there are indeed many ways to skin a cat…

I hope it got clearer how to think about this kind of problem and possible solutions (and how to spot the problems as well). My mind is very visual so I usually draw this kind of stuff to help thinking about solutions - it took me way less time to find the solution after I drawn it on paper than I would spend searching the web or trying to decipher somebody else code - try to find a way that works for you, explaining the problem to another person also helps a lot since it makes you think about what exactly you are trying to solve.

It’s way easier to code when you know what is the real problem you are trying to solve.

This problem was simple and I wrote it quickly (solution and post) but I hope you get the idea… Maybe it’s useful on a future project. I’m still trying to find a good name for a “curried” version of randInt(-1, 1) (-1, 0, 1) I already have two methods called randBit() (0 or 1) and randSign() (-1 or 1) since these ranges are so common I think they deserves their own method, the implementation is also simpler so it has better performance – not that it should make a huge impact is most cases, for me it’s more a readability thing and not needing to pass parameters…

For more helper functions check the amd-utils on github, I’ve been improving it based on my needs and adding new stuff that may be useful when I have the opportunity or when I realize some common pattern, extract it and find a name to describe it…

PS: I hate when people use complex terms to describe simple things, that’s why I tried to explain it on the simplest way possible, by using Math terms the Mathematicians hide the knowledge from people of other domains, nowadays I can understand some of the documents but a few years ago it was like trying to read Greek…

That’s it!

Update (2011/11/2011): Added info about rounding issue only affecting min/max and added solutions #2 and #3.


Comments

Nice post! I never thought about this problem before.

There is a simplest solution, the problem described affects only the min and the max values in the range, to solve it we can just 'expand' their range by 0.5 and the Math.round function will round it correctly:

function randInt(min, max) { return Math.round( rand(min - 0.5, max + 0.5) ); }

Your mindset sounds SO familiar (I studied architecture, and I'm also a self thought programmer). In this case, I've gotten very used to using hardcoded stuff like: var singleInt = Math.floor(Math.random() * 3 - 1); //-1, 0, 1

I know randInt(-1, 1) is certainly more readable and obvious, but I've run into so many implementation bugs and quirks in my life (like that round stuff), and I've spend so many hours understanding method after method to debug simple operations, that I prefer to leave them as low level as possible.

I'm absolutely with you about letting the code comment itself, and more and more my daily effort is to name, segment and order operations properly, however in this kind of math operations (specially with random numbers) I rather have the operations right there, so I don't have to follow a chain of methods to figure out how it actually works (I get distracted a lot).

And btw, as far as I know Math.random() will never ever be equal to zero or one :)

Nice post!

Leave a Comment

Please post only comments that will add value to the content of this page. Please read the about page to understand the objective of this blog and the way I think about it. Thanks.

Comments are parsed as Markdown and you can use basic HTML tags (a, blockquote, cite, code, del, em, strong) but some code may be striped if not escaped (specially PHP and HTML tags that aren't on the list). Line and paragraph breaks automatic. Code blocks should be indented with 4 spaces.