17 Aug 2021 - tsp

Last update 17 Aug 2021

5 mins

This blog post is a short summary on how one can use the Box-Muller transform to generate Gaussian distributed random numbers from an equal distribution of random numbers in the range $(0, 1)$. This range is usually available in an easy way on most computers - even hardware generators that exploit Schottky noise or some quantum mechanical effects as well as pseudo random number generators are usually outputting equally distributed numbers. This is usually required for many purposes (including cryptography) but sometimes it’s necessary to have numbers distributed according to different distributions - for example when performing Monte-Carlo simulations. In this case one can simply transform the random numbers to normal distributed values by using the Box-Muller transformation

Basically the transform is pretty simple. In it’s most basic form one has to choose two statistically independent random numbers $a$ and $b$ from the unit interval $(0,1)$. Then one can directly calculate the two independent standard normal distributed random quantities:

[ R_0 = \sqrt{-2 * ln(a)} * \cos(2 \pi b) \\ R_1 = \sqrt{-2 * ln(a)} * \sin(2 \pi b) ]As one can see this arises from the idea that the two random variables are used to define the polar coordinates of the random number ($R^2 = -2 * ln(a)$ and $\theta = 2 \pi b$).

The idea arises from some basic concepts:

- If one assumes that the $X$ and $Y$ coordinate in a Euclidean coordinate system are distributed according to the standard normal distribution then the pair $(X,Y)$ is distributed according to the bivariate normal distribution.

- The square norm of a bivariate normal distribution yields a $\chi^2$ distribution with two degrees of freedom which is - with these two degrees of freedom - the same as the exponential distribution. This yields the required exponential term to produce the normal distribution.

As one can see generating Gaussian distributed random numbers is rather easy (in case one can afford the squareroot, exponential and power operation) - but what if one wants to have a different mean $\mu$ or standard deviation $\sigma$? Well that’s simple too. One can recall the standard score / z score from statistics:

[ z = \frac{x - \mu}{\sigma} ]This is usually used to transform data points from arbitrary Gaussians into a space with $\mu = 0$ and $\sigma = 1$. This can of couse be inverted to transform a standardized $z$ value back into a space with arbitrary $(\mu,\sigma)$ which is sufficient to produce random numbers for any $x \approx \mathfrak{No(\mu,\sigma)}$

[ x = z * \sigma + \mu ]Assume we have to uniformly distributed random numbers $U$ and $V$. The quantities $X$ and $Y$ are defined as above:

[ X = \sqrt{-2 log(U)} \cos(2 \pi V) \\ Y = \sqrt{-2 log(U)} \sin(2 \pi V) ]First let’s look at the term $\sqrt{-2 log(U)}$:

[ R = \sqrt{2 log(U)} \\ P(R \leq r) = P(-2 log(u) \leq r^2) \\ = P(log(U) \geq -\frac{r^2}{2} ) \\ = P(U \geq e^{-\frac{r^2}{2}}) \\ = 1 - P(U < e^{-\frac{r^2}{2}}) ]Using the fact that $U$ is uniformly distributed:

[ P(R \leq r) = 1 - \int_{0}^{e^{-\frac{r^2}{2}}} 1 dt = 1 - e^{-\frac{r^2}{2}} \\ p_{density}(t) = t * e^{-\frac{t^2}{2}} ]Now the second part $\theta = 2 \pi V$ can be looked at the same way:

[ \theta = 2 \pi V \\ P(\theta \leq r) = P(2 \pi v \leq r) \\ = P(v \leq \frac{r}{2 \pi}) \\ \to P(\theta leq r) = \int_0^{\frac{r}{2\pi}} 1 dt = \frac{r}{2\pi} \\ \to p_{density}(t) = \frac{1}{2\pi} ]Assuming that $U$ and $V$ are independent one can also assume both of the density functions of $R$ and $\theta$ are independent and thus their product forms the sum density function:

[ p_{r;\theta}(t_1, t_2) = p_r(t_1) * p_{\theta}(t_2) = \frac{1}{2\pi} * t_1 * e^{-\frac{t_1^2}{2}} ]How can one implement a simple generator that outputs values distributed
as $X \approx \mathfrak{No(0,1)}$ in ANSI C? Simply use the formula above
and the standard C `rand()`

function.

```
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double randGaussian() {
double a = ((double)(rand()))/((double)RAND_MAX);
double b = ((double)(rand()))/((double)RAND_MAX);
double R0 = sqrt(-2.0 * log(a)) * cos(2 * M_PI * b);
/*
double R1 = sqrt(-2.0 * log(a)) * sin(2 * M_PI * b);
*/
return R0;
}
```

Keep in mind that `rand()`

already is a pseudo random generator so it has to
be correctly seeded using `srand()`

from a real random source or it will
reproduce the same sequence all over again (which might be interesting for debugging
purposes anyways).

To get an feeling how the random numbers are distributed one can generate a huge
number of those and use `gnuplot`

to plot the values using a script like the
following:

```
gnuplot> binwidth=0.05
gnuplot> bin(x,width) = width * floor(x/width)
gnuplot> plot "boxmuller.dat" using(bin($1, binwidth)):(1.0) smooth freq with boxes title "Values"
```

Starting with 100 values it doesn’t look like a typical Gaussian curve:

A typical histogram might look like the following for 10.000 values:

For 1.000.000 values the function even looks more familiar:

This article is tagged: Math, ANSI C, Statistics

Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)

This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/