MMI Index Generator for Very large Index Numbers
I designed an index generator for a very large index to demonstrate the possibility and test the statistical quality. My target was 3.464 billion lines to correspond with the suggested MMI Mindfind project.
The best selection, at least from a statistical perspective, is to use:
nl = 9
N = 5185, (8 * nl)^2 +1 to make the number odd
in a 10-stage generator (9^10 = 3,486,784,401: just above the 3.464 billion target resolution).
Writing out the full equation:
index = (nl^9) * Floor[nl * p1] + (nl^8) * Floor[nl * p2] +(nl^7) * Floor[nl * p3] +
(nl^6) * Floor[nl * p4] +(nl^5) * Floor[nl * p5] +(nl^4) * Floor[nl * p6] +
(nl^3) * Floor[nl * p7] +(nl^2) * Floor[nl * p8] +nl * Floor[nl * p9] + Floor[nl * p10]
p 1 to p10 are the 10 linearized probabilities produced from 10 – 5185 step random walks, for a total of 51,850 MMI bits (0.519 seconds generation time at 100Kbps).
The raw index must be interpolated to reach the exact target number:
interpolated index = Floor[3.464 x 10^9 * index/(nl^10)
The number of unique index outputs is, 3.465 x 10^9, in the range of 0 to (3.465 x 10^9) -1. The exact interpolation range depends on the exact number of lines in the immediate count of sites. If the number exceeds the maximum of 9^10, the design of the index generator would have to be changed to provide a 10-stage generator with nl = 10 – a 10-billion-line potential range.
The KS test for 100,000 samples of the generator gave, KS+ = 0.974 and KS- = 0.9993; very close to linear. The graphical output of the test data (the cumulative distribution function) is:
Note the small kink in the data between 0.4 and 0.6. This nonlinearity means that close to the middle of the index range, when a very large number of indexes are drawn, there will be a slight shift toward higher numbers. This shift in probability is a tiny fraction of a percent. Given the huge size of the index resolution, the shift would at times be significant in terms of hitting an exact target number.
Because MMI seems to inherently compensate for internal complexity of the generation function, the nonlinearity may not be significant. However, given the number of degrees of freedom – in the billions – it seems highly unlikely an MMI application will be able to hit an exact target in one try. This is especially true if the order of the list is random, as it would seem to be in the suggested application. If the list were arranged according to a progression of semantic meaning, a close hit from the index generator would actually be close in meaning as well. These last thoughts are a bit speculative since this type of MMI index generator has never been tried before.
As with any type of Internet search, a block of near index numbers should probably be returned allowing the user to pick the one that seems to best match their intention. This is how one would explore any real-world environment. We look around at hundreds of details and our mind makes sense of what we see in the context of our experience, then we focus on what catches our attention.