# Testing the distribution of the digits of pi

Just a fun one for today. No outlandish Bayesian-based philosophy, just straight up classical significance tests. I will even take it relatively gently for those of you who are not “into” statistics, because today’s fun is for everybody!

So, the question of the day is “if I pick a random digit of pi (that I don’t know in advance of course), is it equally probable to be any of 0 to 9?”. Actually one could go a lot deeper into this question than I am going to, and look for correlations and all sorts of things, but we will just look at the statistics of some big samples of the first digits of pi.

Disclaimer: I am not going to calculate these digit frequencies myself, due to “effort minimisation” ;). The following data comes from this website, and I have made no effort to verify its accuracy. So with that in mind, here is a histogram of the frequencies with which the digits 0 to 9 appear in the first three billion digits of pi:

Pretty damn uniform looking right? Lets zoom in a bit:

Note the scale on the y axis. We are looking at fractional variations of around $10^{-4}$, or $(1/1000) \%$ if you prefer. So on the face of it the hypothesis that the digits of pi are evenly distributed over 0 to 9 seems to be extremely good. However, this vague intuition is not good enough! We should do some statistics and see how good the evidence really is!

The simplest statistical test we can do is to calculate the p-value for this data, under the “null hypothesis” that the digits are randomly sampled from the options of 0 to 9, with equal probability for each option. This is the equivalent of having a bag with equal numbers of tiles with the digits 0 to 9 written on them, say 10 of each, and reaching into this bag to pull out the next digit of pi (and then putting the tile back after each draw to avoid messing up the probabilities on the next draw). This process will give you sets of numbers whose properties follow the famous binomial distribution:

$\displaystyle \text{Pr}(K = k) = {n\choose k}p^k(1-p)^{n-k}$

for $k=0,1,2,...,n$, where

$\displaystyle {n\choose k}=\frac{n!}{k!(n-k)!}$

For our case, $K$ is the number of occurrences of digit $i$ we get in our sample (i.e. one of 0-9; we are hypothesising that we get every digit equally, therefore each digit has the same distribution). So $\text{Pr}(K = k)$ is just telling us the probability that we get $k$ occurrences of digit $i$ in our sample, which is of size $n$. Finally, $p$ is the probability of drawing the digit in question from the bag on each individual draw, which since we are claiming is equal for each of the 10 possible digits gives us $p=1/10$

The relevant properties of the binomial distribution that we need to consider are the expected value and the variance; i.e. we want to know how many occurrences of $i$ we expect to see in our sample, and the typical spread around this value a sample will have. These are

$\displaystyle \text{E}(K) = n p \text{,~~~~and~~~~} \text{Var}(K) = n p(1-p).$

Plugging in our numbers (for n = 3 billion) we get E = 0.3 billion (i.e. on average every number should appear one tenth of the time – as one knows intuitively), and Var ~ 0.3 billion also. However the variance is not very intuitive I find, generally the standard deviation is better (the square root of the variance) since we have an intuitive understanding of it thanks to the normal distribution. So for us the standard deviation is $\sigma = \sqrt{\text{Var}} \approx 16000$. So if our pi digits are really following this distribution we should expect the digit counts to fluctuate by about this much around E, which is a fractional variation of $\sigma/E \approx 0.5 \times 10^{-4}$, or about $1/(2000)\%$, which is pretty close to the guess we eyeballed from the histograms above.

There are two more pieces of information I must mention before we can actually do our statistical test. The first I alluded to a moment ago — that is, the binomial distribution is asymptotically normal, which just means that for large sample sizes we can use the normal distribution (with mean and variance given by the formulae above) instead of the binomial distribution, and we will get the same answers. Our sample size of 3 billion is vast so this approximation is essentially exact, and life will be much easier if we work with normal distributions, so we will. The second involves a property of samples from normal distributions. This is that if we take the sum of the squares of normally distributed random variables (shifted by their means and scaled by their variance), then the resulting sum is a random variable that follows the famous chi-squared distribution. In equation form, if we have $N$ normal random variables $X_i$, with mean $\mu_i$ and variance $\sigma_i^2$, then the sum I am talking about is

$\displaystyle Q = \frac{(X_1-\mu_1)^2}{\sigma_1^2} + ... + \frac{(X_N-\mu_N)^2}{\sigma_N^2}$

and where $Q$ can then be considered a sample from the $\chi^2_{k=N}$ distribution, i.e. the chi-squared distribution with N “degrees of freedom”. I will not go into details about degrees of freedom here, for now it will be enough to say that in this case it is just the number of terms in the sum, the number of normal variables we are adding together. We also have an additional constraint which will make us have to reduce this “degrees of freedom” by one, specifically the constraint that the sum of occurrences of each digit has to add up to the sample size.

We are now ready to do our test! So, what is it that we want to know? There are many questions we may ask of our data and each requires a different test. However, the most obvious one to ask is whether the observed digit frequencies sit around the expect values of $1/10$ with the expected spread. The common translation of this question is to ask whether the sum of the (scaled) squares of the frequencies is as small as we expect based on our hypothesis. So, (remembering that $k_i$ is the number of occurrences of each digit $i$ in our data, $\text{E}(K_i)$ is the expected number of occurrences, and $\text{Var}(K_i)$ is the variance) we compute the sum

$\displaystyle Q = \frac{(k_0-\text{E}(K_0))^2}{\text{Var}(K_0)} + ... + \frac{(k_9-\text{E}(K_9))^2}{\text{Var}(K_9)}$

The statistical question now solidifies into asking “assuming the null hypothesis is true, what is the probability of observing data which gives a value of $Q$ larger than the one our actual data gives?”. This quantity is known as the “p-value”. There are many issues associated with p-values, but today I will not talk about them. For now, noting that Q is supposed to be distributed according to the $\chi^2_{k=9}$ distribution if our hypothesis is true (degrees of freedom 10, minus 1 for our constraint), we can compute the p value as

$\displaystyle p(Q) = \int_Q^\infty \! \chi^2_{k=9}(q) \, \mathrm{d}q.$

For brevity, I must leave it to you to wikipedia chi-squared distribution to see what this distribution looks like and get and idea of what this integral means. Suffice it to say it requires numerical evaluation, but any statistics package or library will have routines do this so just pick your favourite. I am a Python guy so I use the scipy.stats package.

So now I will skip to the punchline, which is that according to this test, with the null hypothesis as described and using the data I stole from the linked website, I compute

$\displaystyle p = 0.33$

This means that there is a $33\%$ chance of seeing digit occurrences with larger average (roughly speaking) deviations from the predicted mean in the idealised “bag-drawn” random number, than we observed in the first 3 billion digits of pi (and $67\%$ chance of seeing smaller such deviations). This is not very surprising at all if the bag-drawn model is valid, so the classical interpretation is that this data contains no evidence for rejecting the null hypothesis. It does not let us say that indeed the digits of pi are randomly sampled in this way, only that this test provides no evidence that they are not.

The final warning is that this test considers only the value of $Q$. There are many ways that digits could appear in pi that would be extremely un-random looking, yet would give identical values of $Q$ to a set of digits that really was random, i.e. $Q$ is global property of the data, and will tell you nothing about local patterns. To look into these more intricate details we would have to devise yet more tests!