Increasing MMI Effect Size by LFSR Processing of MMI Bits
MMI bits are usually processed to remove statistical defects, primarily bias (imbalance between 1s and 0s) and autocorrelation (relationship between bits in the sequence). The processing can consume many bits to produce a single output bit with the necessary statistical quality. The processing described here can return statistically acceptable bits while using only one input bit for each output bit. In addition, MMI bits produced by a true random generator or MMI generator can be extended or “stretched” to provide more bits than come from the generator.
One might assume if each bit with entropy HB is extended to N bits, the resulting entropy in each extended bit will be HB /N bits. This is untrue because the result of recombining all the entropy in the extended bits is always less than the entropy in the original bit, HB. Assume entropy is conserved in a reversable transformation: that is, recombining all the extended bits results in a single bit with the original entropy, HB. To make this true, the relative predictability, PR, of each extended bit is the relative predictability of the original bit raised to the power 1/N. When the bits are recombined, the relative predictability of the recombined bit is the product of the relative predictabilities of the N bits. This obviously gives a power of (1/N)N = 1, restoring the original PR and hence the original entropy, HB.
How are bits extended for use in MMI?
Bits are commonly extended for use in encrypting information or messages. A key or seed, which is usually only 128 or 256 bits long, is transmitted from one party to another to enable sharing secret information. The key is used as a seed for a pseudorandom generator, which then generates a large number of bits – hundreds to thousands as many bits as were in the seed. The seed itself should be a high-quality true random number, although the true entropy of the seed is rarely known since entropy is so little understood.
The following example illustrates the relationship between original or input bits and extended bits. The entropy of the original bit(s), HB, must be known. Assume HB = 0.999999. What is the entropy of the corresponding PR1/N? Predictability, PB = 0.5005887049 and PRB = 0.0011774099. If the input bit is extended by a factor of 10 (10 bits out for each bit in), the PR10 = 0.50943968 and P10 = 0.75471984. This gives H10 = 0.80371129/bit. In each case, 10 is appended to the subscript to indicate the variables are for bits extended by a factor of 10.
As the input entropy gets closer to 1.0, the resulting extended-bit entropy also increases. A high-quality random generator without deterministic postprocessing may produce bits with H = 1.0 - ε, where ε = 10-50. PR10 = 0.003214347 and H10 = 0.999992547. As the number of extended bits increases, the extended-bit entropy decreases. With N = 100, the extended-bit entropy, H100 = 0.75718. Note, the total amount of extended-bit entropy also increases to over 75 bits.
MMI is a very different type of application than encryption, which is purely algorithmic. MMI applications cannot be simulated, so only real-world testing can confirm methods that actually work. Two variations for extending bits for MMI use have been tried. They use a set of linear-feedback shift registers (LFSRs) constantly fed by a source of entropic bits. Entropic bits are generated by sampling a property of a physical entropy source, typically represented by a voltage. The figure shows an example layout (from US pat. 6,862,605, Wilber, fig. 5)
The registers in the LFSRs must have lengths relatively prime to each other. That means their lengths can have no common factors. Lengths that are all prime numbers are automatically relatively prime. The relative predictability of the output bits can be calculated as the generator runs through a number of self-seeding initialization cycles. The PR can be estimated for the example generator. After 330 bits have been input, and assuming the entropy of each input bit is 0.25746 bits. Predictability at the input is, P = 0.956647, PR = 0.913294. The PR at the point labeled output is roughly 0.913294**96 = 0.0001654314, P = 0.5000827157 and finally the entropy is H = 1 – ε, where ε = 1.9741521 x 10-8. Note, these are only estimated results.
The calculations assume no output bits are taken during the self-seeding period. When bits are taken at the output, the effect is to split or extend the bitstream into two streams; one is the output stream and the other is fed as usual into the LFSRs. For clarity take the output from line 131, which also feeds the LFSRs. Initially the output bits and the bits fed into register 128 have relative predictability of the square root of 0.913294 (from each input bit) times 0.0001654314 (initial value of the unsplit bits) = .0122918. While register 129 is being filled, each output bit and bits fed into the register have PR = 0.00507654. While register 130 is being filled, each output bit and bits fed into the register have PR = 0.00161539. Note, for subtle reasons of calculating the entropy, the longer register should be filled first, progressing to the shortest.
UPDATE NOTE: more rigorous calculations suggest these estimates are incorrect so results in the previous two paragraph are provisional. The LFSR approach does work to produce statistically high quality output sequences, and it can be used to split or extend the number of input bits if their entropy is very close to 1.0. However, the exact modeling of output entropy, which is quite complex, suggest output bits will have the same entropy as the input bits when one output is taken for each input. More study is required.