A Brief Introduction to The Concepts of Entropy and Randomness

What is a truly random number?
Random Numbers are numbers that can’t guessed or predicted by any algorithm. A measurement of something naturally random is required to produce a random number. Random numbers are also described as true, truly, hardware, non-deterministic or quantum.

What are some things that are naturally random?
Vibrations of electrons in a resistor (thermal noise), cascades of electrons passing through a semiconductor junction (breakdown or avalanche noise), bunching of electrons passing through a diode junction (shot noise), bunching of photons in a light beam (photon shot noise), charges moving across a thin insulator (quantum tunneling), photon polarization (polarization analyzer or polarization beam splitter), photon passing through a beam splitter and timing of nuclear decay.

Some people consider certain sources, such as the vibration of electrons in a resistor (thermal, Johnson or Johnson-Nyquist noise), to be merely chaotic rather than inherently random. In this context chaotic means the process is extremely complex, but given sufficient information on initial conditions and computational power, future bits could be predicted. This assertion is not correct because such measurements and the necessary knowledge of position and momentum of the electrons is not just astronomically complex, it is theoretically precluded by Heisenberg’s uncertainty principle.

What is a pseudorandom number?
Pseudorandom numbers, meaning artificial or random in appearance only, are produced by a computer program to simulate unpredictability. Pseudorandom numbers are often used in place of true random numbers because they are easier to produce and they can be made very hard to predict. Every pseudorandom generator has a finite period, meaning its output sequence will begin to repeat after all its potential states have been produced. High-quality pseudorandom number generators have been notoriously difficult to design and there were a number of monumental flops in the use of what turned out to be very bad generators. One of the best modern generators is the Mersenne Twister, although it must be seeded very carefully including a runoff of numbers at the beginning before its output is reliable. Even then it will begin to fail very long-term statistical tests unless its output is further appropriately whitened.

What is Entropy?
Entropy exists in a physically unpredictable property or process measured to produce random numbers. On a scale from 0 to 1, it indicates how unpredictable the generated random number is. True random numbers have an entropy of 1.0, while pseudorandom numbers have real entropy of 0.0. Entropy has different definitions in other technical fields, but we use the definition for information entropy derived by Claude Shannon. For binary bits, the equation is

H = –(p(1)Log p(1) + p(0)Log p(0))

where p(1) and p(0) are the probabilities of a one and a zero occurring. The Log functions are base 2. This equation may be modified by replacing the probabilities p(1) and p(0) with a single variable, predictability or P, and the entropy equation becomes,

Hp = –(P Log P + (1–P)Log (1–P))

where 0.5 ≤ P ≤ 1.0. Predictability can also be calculated numerically from the entropy: P = h*, where h* is the mathematical inverse of the Hp equation.

Why do we need random numbers?
Random numbers are used to make codes to transfer private messages or money without danger of eavesdropping or theft. They are also used to make “random” selections, such as lottery numbers and all sorts of drawings. In addition, random numbers are used for a wide variety of other applications including simulation (such as Monte Carlo simulation), random computation, statistical testing, neural networks and even in music and art. True random number generators have also been employed in the field of Mind-Matter Interaction research.

How do we tell how good the random numbers are?
The quality of random numbers is determined both from understanding the theory of how they are made and by applying statistical testing algorithms. Statistical tests always look for patterns in a string of numbers. The more pronounced any patterns are found to be, the more predictable (less random) are the numbers. Statistical tests must be very carefully selected or designed to reveal more subtle patterns in longer sequences of numbers. It used to be enough to say a generator passed Marsaglia’s Diehard test, for example, to be considered a good generator. Now tests must be run on longer sequences of numbers and collections of tests must be combined in meta tests to see if there are patterns in collections of results.

Tests in test suites should be substantially independent or they may give a biased picture of test results. As an example, the statistical test suite developed by NIST to test random sequences includes two tests (Approximate Entropy and Serial Test) that produce virtually identical results if the index of one is incremented relative to the other. These tests should not be used together in a test suite.

Just read this article now, but thanks. This is just the writeup with the answers to many of the questions I had in the beginning when I started out. If we ever get a “Beginner’s start here” type section on the forum, a link to this thread should be included.