Prime Numbers and the Benford’s Law

Today, I read a news article from the Physorg.com about the new pattern found in the Prime Numbers, the article talks about the new discovery by Bartolo Luque and Lucas Lacasa:

In a recent study, Bartolo Luque and Lucas Lacasa of the Universidad Politécnica de Madrid in Spain have discovered a new pattern in primes that has surprisingly gone unnoticed until now. They found that the distribution of the leading digit in the prime number sequence can be described by a generalization of Benford’s law.

I was very surprised by the fact that nobody have noticed that before and after read the original paper (if you are interested, read it) describing the new patterns discovered, I was very impressed and impatient to see it in pratice !

The new pattern discovered is based on the so-called GBL (Generalized Benford’s Law), which you can see in the paper at the Eq 3.1:

gbl

Where the P(d) means the probability of appearance of the leading digit d. The alpha is the exponent of the original power law distribution (for alpha = 1, the GBL reduces to the Benford’s law).

The authors says that for a given integer interval of [1,N], there exists a particular value alpha(N) for which the GBL fits with extremely good accuracy the first digit distribution of the primes appearing in that interval and showing the functional relation between alpha and N in the Eq 3.2:

functional

Where a = 1.10 +- 0.05 for large values of N. They also cite a GBL extension, but I’ll use just these formulae to plot our distributions.

So I have implemented these formulae into the simple pybenford module as follows:

def gbl(alpha, digit):
   return 1/(10**(1-alpha)-1)*((digit + 1)**(1-alpha)-digit**(1-alpha))

def calc_alpha(n, a=1.10):
   return 1/(math.log(n)-a)

def gbenford_law(alpha):
   return [gbl(alpha, digit)*100.0 for digit in xrange(1,10)]

For the reason that we are using an infinite integer sequence, we must always pick the sequence interval [1, N] where N = 10^D  (see the  Natural Density section of the paper for more information).

The next step is to create a list of prime numbers between an arbitrary interval of D=8, or [1,10^8]. In this step I used the Sieve (see more information) utility to create a file with the generated prime numbers in the cited interval, I used the follow command to get this file output:

sieve2310.exe -s 1 -e 100000000  >>sieve_n8.txt

The sieve is very fast, this will create the file “sieve_n8.txt” with nearly 66MB (don’t worry, it’s a very fast generation, it took 8 seconds for me using a Intel Core 2 Duo 2GHz).

And we are ready to use Python and pybenford to read the prime numbers, calculate the leading digits frequency and plot our result ! Here is the code I created:

import pybenford

sieve_file = open("sieve_n8.txt", "r")
prime_list = [int(prime) for prime in sieve_file]
sieve_file.close()

alpha              = pybenford.calc_alpha(10**8)
benford_law        = pybenford.gbenford_law(alpha)
prime_distribution = pybenford.calc_firstdigit(prime_list)
pybenford.plot_comparative(prime_distribution, benford_law, "Prime Numbers")

And voilà, here is the output plot showing an extremely good accuracy claimed by paper authors (click on the image to enlarge):

prime_plot

The plotting of the distributions (click to enlarge)

If you are interested on Benford’s law, there are some posts about it here and here.

I hope you liked this =)

UPDATE 10/05: Mike Loukides did a good work generalizing for other bases, thank you for sharing your experiment Mike.

UPDATE 08/08 (lol): There are many more comments about this post on Reddit, see here.

38 thoughts on “Prime Numbers and the Benford’s Law

  1. Great posts on Benford’s Law. I’ve just finished Marcus du Sautoy’s book on ‘the music of the primes’ so your timing is excellent 😉

  2. So, as far as Benfold’s law is concerned, you’re saying that the pattern of prime numbers is indistinguishable from a random set?

  3. Ok, now make the graph for other bases. There is nothing special about the base 10 notation..

  4. @Jasper: what the base has to do with this ? I do not understand your question. The authors of the paper have discovered an unnoticed feature of prime numbers distribution that matches almost exactly with the Generalized Benford’s Law.

    @i_am_ed: No, they are saying that the prime numbers distribution follows the GBL. There are some random sets that follows the Benford’s Law, from many sources like natural events, but it doesn’t implies that if a set follows the GBL distribution, it will be indistinguishable from a random set, it’s an argument, but is not enough. For centuries people are trying to understand how prime numbers are distributed, this is a step in this direction.

  5. Jasper is probably referring to the fact that mathematics does not “select” a base 10 system that uses digits 0-9. What does Benford’s law predict in base 12 or base 16? It would probably be a simple extension of the results to generalize the formula in the post to an arbitrary base.

  6. Great!….now we know for sure that Gd is not a
    cheater….hahhha
    or do we???
    it is/would not hard to manipulate data to conform to Benford’s law if one already has ‘knowledge’
    of Benford’s law… i suspect this is ‘fools gold’
    so to speak.
    Gd 1 Humans 0

  7. Nice. I like it.

    To address the other posters here regarding a random set: An even weighted random set would result in a flat line, not the curve seen here. Any number in the range is equally likely in a unbiased random set.

    The fact that the distribution follows Benford’s indicates that the distribution of prime numbers is proportional to the size of the prime number, which makes sense. As prime numbers get larger, the average distance between the primes also gets larger. This will cause the distribution of prime numbers to fall into Benford’s law.

    As for the bases (base 10 vs base X), Benford’s law holds for any base. The higher the base, the flatter the curve you get because percentages are spread across more digits so each digit has a smaller percentage. However, Benford’s curve still exists.

    Thanks for posting Christian. Interesting work.

  8. Agree w/Jake and Jasper, but am too lazy/busy to look at it myself. I’d actually be interested in bases < 10, like … 3. 2 is a little too obvious :-)

  9. I agree, I’m interested in whether this works in other bases. Here’s some Mathematica for counting initial digits for an abitrary number of n-ades for base n <=36:

    1
    2
    3
    4
    5
    fcharnum[n_, b_] :=  First[Characters[ToString[BaseForm[n, b]]]]
    With[{base = 32, ades = 4},
     Table[{d,
       Count[Table[ fcharnum[Prime[n], base], {n, PrimePi[base^ades]}],
        fcharnum[d, base]]}, {d, 1, base - 1}]]

    The resulting data “looks like” it fits your graph, but “looks like” clearly doesn’t count.

  10. Nice write up. But I’m trying to understand what this implies. I understand it tells us the distribution of prime numbers follows GBL. But the prime number theorem has given us an approximation of the distribution for some time. Do you see this new discovery affecting the significance of the prime number theorem?

  11. I’ve played around with it some more in Mathematica–both counting the primes and computing the GBL. Here’s the code:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    With[{base = 32, ades = 5, a = 1.1}, nprimes := PrimePi[base^ades];  
     fcharnum[n_, b_] :=  First[Characters[ToString[BaseForm[n, b]]]];
     alpha := 1/(Log[base^ades] - a);
     gbl[ d_] := ((d + 1)^(1 - alpha) -
         d^(1 - alpha))/(base^(1 - alpha) - 1);  
     ListPlot[{Table[
        100*gbl[digit], {digit, 1, base - 1}] , (100/nprimes)*
        Sort[Tally[Table[ fcharnum[Prime[n], base], {n, nprimes}]]][[All,
          2]]}] ]

    I’m surprised–the plots line up quite well, regardless of the base. So this doesn’t appear to be assuming that the base is 10. (base = 32, 5 32-ades is nice, for primes up to about 33M)

  12. Are The Prime Numbers Randomly Distributed? Part 1
    By: Johan G. van der Galiën (M.Sc.)
    (For comments e-mail: galien8@zonnet.nl)
    Version 1.5. April 23, 2006 (version 1.1. from August 27, 2002)
    The prime numbers up to 109 were transformed to Benford numbers and the digital distribution results were subjected to a Chi-square analysis. If the prime numbers are truly uniform random then, for an infinite amount of them, their Benford counterparts should be evenly distributed among the digits 1 to 9.
    But the Chi-square values increased with the amount of prime numbers tested, far beyond the point of reasonable doubt, with a confidence level << 10-10. Thus falsifying the null-hypothesis …

  13. @Bill: I don’t think so, im my humble opinion of a philosophy lover, my intuition says that this discovery is pessimist in the mathematical sense. Why ? Well, from many “random” sources that we can’t foresee, we have the Benf. law, so what we see now, is that we have this law on the prime numbers too, more one argument saying that we may never find a way to foresee them too.

    @MikeL: thank you for sharing your experiment source Mike, very interesting.

  14. MikeL – wtf are you thinking? a prime number is prime no matter what base its in. and benfords law is base invariant. why would the base matter in the slightest?

  15. Hi friends..!

    This is a very exciting discovery indeed..!

    @Christian.Perone, Very nifty work indeed…i really wish we could have such lucid illustrations to accompany Scientific papers…!

    @MikeL , Thanks a lot…in fact, I was jussst reading the article, & was thinking of suggesting a bit of Mathematica–(L) analysis/illustration…! :)

    We Never know HOW many amazing truths are patiently waiting for us to “Open our eyes”–smiling @ us silently..aren’t they…? 😉

    Cheers…! :)

  16. What is so new about this? If you google search
    ‘Are The Prime Numbers Randomly Distributed? Part 1
    By: Johan G. van der Galiën (M.Sc.)’.

    This was written over 7years ago and also relates
    the first trillion prime numbers to Benford’s Law.
    Shouldn’t Mr. Galien recieve some credit and respect;
    unless he is amongst thiefs

  17. This is indeed an amazing result, first observed 200 years ago and proved 113 years ago. It is called the prime number theorem. Also see http://stevekrenzel.com/prime-numbers-and-benfords-law.
    Here is some python code:

    from math import log
    
    def pp(x,N,a=1):
        return sum([10**k*x/(log(10**k * x) - a) for k in xrange(1,N)])
    
    def p(N,d,a=1):
       return (pp(d + 1, N, a) - pp(d, N, a)) * (log(10**N) - a) / 10**N
    

    The prime number theorem asserts that asymptotically the number of primes up to x is x/(log(x)). It can be shown that the function x/(log(x) – a), which is asymptotically equal to it for any a, is closest to it for a = 1, but as legendre observed in 1798 (also by curve fitting), for low values of x a better value is 1.08366, which I suspect is related to the 1.1 +/- 0.05 of the article.
    In any case, the function p(N,d,a) approximates the fraction of primes up to 10^N that start with the digit d if the total number of primes up to x is given by x/(log(x) – a).

    >>> for d in range(1,10):
    ...     print d,p(10,d,1)
    ...
    1 0.117271903286
    2 0.114399057762
    3 0.112610088948
    4 0.111316205436
    5 0.110306372907
    6 0.109480516779
    7 0.108783318417
    8 0.108181014445
    9 0.10765150511
    
  18. Christian
    I am having a hard time discerning an ‘obvious’
    difference betweenst the papers. Is it not about
    prime number patterns and the lack of randomness.
    Please explain to me in simple terms where i am
    in error and also explain ‘what does random really
    mean’.
    thanks for the articles and your patience…

  19. @Dox: thank you for the comment !

    @Descartes: if we are reading the same papers, on the paper of von der Galiën he says: “To come back to the formula that inspired me, the Law of Benford (% of digit = Ln(1+1/digit)*100/Ln(10)), the primes DO NOT OBEY this law.”, and in this paper, they are saying that prime numbers obey the GBL.
    But your question about “what does random really mean” is more complex, because, I think, it depends of your philosophical beliefs.
    A man who believes in determinism will argue that the “true randomness” doesn’t exist, because all things in our world are governed by cause and effect, it may say: “randomness is when we don’t know all the variables to predict or understand”. But a man who believes in free will, will argue that the randomness exist and there are some things that we will never understand, even knowing all variables, because something must break the cause and effect laws, or some like that. Can you understand that the definition of this concept depends of a belief and not science ?

  20. Hi Christian, thank you very much for your clear explanations… I’m trying to get the impliances of prime numbers obeying GBL. I thought that if something obey a law, it becomes predictable. But it seems to be just the opposite, obeying GBL is being random, and then unpredictable… I can’t get it!! Please, I’d thank you very much if you could bring me some light about this.

  21. Hello, obeying GBL or Benford’s Law doesn’t imply a predictable behavior, because it doesn’t says what will be the next term, it simple says that the statistical distribution of the first digit follows a pattern.

  22. I have a plot I made in 2002 noticing exactly the same. Now, I was so surprised of me discovering that so easily, that I just thought “sure somebody found it before”…
    😀
    Now, some bibliographic research have to be done before give credit to above mentioned scientists Luca and Lacasa.

    I didn’t care too much because prime numbers is just a hobby, I do research in an completely different topic (coastal ocean physics).

    Greetings
    Isaac

    1. Isaac, did you have generalized the Benford’s Law ? Because I’ve done such plot before and it doesn’t fitted. Some people had done this before, but not using the GBL, that’s the point.

  23. Yes, and it’s not exactly logaritmic but a power law distribution in size. I’m gonna repeat my experiment to have more details, I remember I computed the alfa too but all those details were handwritten.
    Due to tech limitations, I worked with first billion numbers only, I saw now they are on the trillion order.
    I’ve been fascinated by prime numbers but I do work in oceanography, so not too much time left to keep the pace of full-time hard working mathematicians :)
    I know also the Institute I work gave a prize to an Indian mathematician for a new algorithm to find prime numbers,
    See the news here:
    http://pio.ictp.it/words/news/2004/news_2004_Jan_00.news/

    But it’s still not what I want 😉 I want a deterministic formula 😀

    Greetings

    Isaac

    1. Nice, I’ve done some like this before but not using the GBL, that was my problem hehe.
      I’m very fascinated by prime numbers too, I’ve tried so many many ways to understand it, but I always sadly give up with the failures. Primes are very strange, Ulam spiral is one of those effects; as Euler said: “… we have reason to believe that it is a mystery into which the mind will never penetrate.”
      But my personal opinion about the BGL and primes, is that this is bad news for us, the main events of the nature which follows Benford’s Law are non-deterministic (unpredictable for us), and if primes do follow it, it’s more an argument for the unpredictable behaviour of primes.

  24. Let’s not give up. I always said that having a distribution doesn’t make them random. I think this of all datasets I get, but specially of prime numbers…

    1. I’ve figured something sometime ago, if there is a deterministic way to find prime numbers, it is not with our math methods.

  25. Hi,

    This is really interesting, I am currently trying to write my dissertation on the implications of this. I would like to redo the research myself but am struggling to make a table showing the probability of each digit (1-9) for 10 to power 2, 3,4 and 5 so that I can make a comparison to the actual results manually counted. Any tips/ do you know if this has already been published anywhere to save me time

    Many thanks

    1. listen all you “Mathematics potsherds”, learn simplicity. we have discovered by a new mathematics of -1 inverse tangent, and placed 2 billion Pure consecutive Prime numbers, with simplicity. Each prime number has its own rythm , and mathematics is at -1 tangent , at 19 1:3,1:6

      Link to our web site, see for your self

      http://hoperesearch.web.officelive.com/default.aspx

  26. I’m also looking into this. Though not complete yet, what we found is not quite the same. First, we conjecture Prime Numbers are not so special, and many other sequence behaves just similar to this. That is beside the point.

    Also, what happens is as n-> infty, Q_b^d(b^n) -> GBL_d(0)=1/(b-1) is obvious. Of course you can make them as close to each other as possible with 1/(LogN – a) — because 1/(LogN – a) goes to 0 and |Q_b^d(n) – GBL_d(alpha(n))| goes to 0 as well!!! We think it by no means make it a good approximation. The two graphs get close simply because they go to the same point.’

    (My English is not very good, so if you are confused at what I mean, then just reply to it.)

    @jsh Benford’s law IS base variant. Given any natural number b >= 2, rewrite any given natural number n as n = s . b^j where j is “maximal”. i.e. b^j <= n < b^(j+1). s is the significand of m, and the floor of s is the first leading digit of n BASE b. Let us denote it as D_b(n). Then, let P_b^i(n) = (number of numbers m Log((i+1)/i) / Log(b). See the Log(b) there? 2^n is only Benford base b where Log(b)/Log(2) is irrational for instance.

  27. “…a new pattern in primes that has surprisingly gone unnoticed until now.” On the flip side, you could say it’s surprising that the fact that it HAD been noticed before had somehow gone unnoticed.

    It goes back as least as far as 1973, when Springer-Verlag published Jean-Pierre Serre’s book, “A Course in Arithmetic.” On page 76 there is the following paragraph:

    “One can prove that, if A has natural density k, the analytic density of A
    exists and is equal to k. On the other hand, there exist sets having an analytic
    density but no natural density. It is the case, for example, of the set P1 of
    prime numbers whose first digit (in the decimal system, say) is equal to 1.
    One sees easily, using the prime number theorem, that P1 does not have a
    natural density and on the other hand BOMBIERI has shown me a proof that
    the analytic density of P1 exists (it is equal to log10(2) = 0.301029995…).”

    Serre’s book is available at http://www.math.purdue.edu/~lipman/MA598/Serre-Course%20in%20Arithmetic.pdf.

    Then, in 1984, the Journal of Number Theory published an article by Cohen et al called “Prime Numbers and the First Digit Phenomenon” (Volume 18, pages 261-268), which demonstrated that Benford’s Law would hold for any generalization / extension of the natural density. The article by Cohen didn’t use the term “Benford’s Law,” which is perhaps why it went unnoticed by Luque and Lacasa, and those who commented on Luque and Lacasa’s work.

    The paper by Cohen et al is available at http://ac.els-cdn.com/0022314X84900611/1-s2.0-0022314X84900611-main.pdf?_tid=a67c33c6-f264-11e4-8a1f-00000aacb35f&acdnat=1430747663_e638c6857350d3f4388bb15b62837e86
    (see http://www.sciencedirect.com/science/article/pii/0022314X84900611 )

Leave a Reply

Your email address will not be published.