An index can be generated using random walks produced from the output of an MMI generator. The index may be used for finding a location in one or multiple dimensions. A duplicate index generator is used for each dimension, though all the numbers may be drawn from a single MMI generator. To find a particular line in a list, use a 1D generator. For geographic locations, use a 2D generator. I suggest drawing numbers from the MMI generator approximately at the same time by alternately using small block of data to increment each random walk total until all the totals are complete. This is meant to keep the MMI interaction time the same for each index generator stage and each dimension.
The use of multi-stage generators makes it possible to achieve very high resolution without having to use what would otherwise require a random walk with an enormous number of input bits or steps, N. To achieve good linearity of output of a single-stage random walk index with a number of unique lines, nl, takes about (2 nl)^2 to (8 nl)^2 steps. Note, the up arrow, ^, indicates a power – in this case, the value in the parentheses squared. For 1000 possible lines the number of steps would be 4 to 16 million steps. This number of bits is too large for a practical MMI system. A two-stage 1D index generator with 32 internal lines would require about 32768 total steps, half in each stage. A three-stage generator with 10 internal lines in each stage would require a total of only 19200 bits.
Achieving best MMI responsivity (highest effect size) requires optimizing the parameters in the index generator. An optimum mental interaction time is assumed to be in the range of 0.2 to 0.4 seconds. Using an MMI generator with 100Kbps generation rate would provide 20,000 to 40,000 bits to be used in the index generator. The parameters chosen depend on the number of dimensions and the resolution (number of unique index lines).
In general, choose the simplest design and use the largest number of steps available for each random walk. Testing indicates the number of bits required is highest at the lowest resolution and decreases toward the highest resolution. The number of lines is roughly in the range of 10 to 100, but these are not absolute limitations. For 10-20 lines, use N ≥ (8 nl)^2; for 21-48 lines, use N ≥ (4 nl)^2 bits for 49-100 lines, use N ≥ (2 nl)^2 bits. These are intended as general guidelines, and specific systems should be tested thoroughly to verify performance. Note, it is always strongly recommended that N be an odd number to prevent ties in the random walk. When the terminal position (number of steps away from the starting point) is 0, the result is somewhat indeterminate for some applications. Requiring N to be an odd number prevents that from happening.
The following list includes a number of examples to illustrate the RW index generator design. index is the output for each call of the index generator. The output is always a whole number or integer, depending on how it is cast in the language being used. The N values shown are the minimum, not necessarily the optimum as defined above. The p values are calculated by taking the random walk terminal values, dividing by the standard deviation (SD = square root of N) to produce z – scores, and then converting the z – scores to p values (evaluate the normal cumulative distribution function at z or use the approximation curve fit). Note, the z – score is also (2 * total number of 1s in the RW data – N)/square root of N
nl = 10. (Number of Lines multiplier – must be a whole number.)
N ≥ 6400. is the number of steps in each random walk. 12800 total for each dimension.
p1 is the linearized probability from random walk 1.
p2 is the linearized probability from random walk 2.
index = nl * Floor[nl * p1] + Floor[nl * p2]
The number of unique outputs in the index is nl^2 = 100 (0 to 99)
nl = 20. (Number of Lines multiplier – must be a whole number.)
N ≥ 25600. is the number of steps in each random walk. 51200 total for each dimension.
index = nl * Floor[nl * p1] + Floor[nl * p2]
The number of unique outputs in the index is nl^2 = 400 (0 to 399)
nl = 100. (Number of Lines multiplier – must be a whole number.)
N ≥ 40000. is the number of steps in each random walk. 80000 total for each dimension.
index = nl * Floor[nl * p1] + Floor[nl * p2]
The number of unique outputs in the index is nl^2 = 10000 (0 to 9999)
nl = 10. (Number of Lines multiplier – must be a whole number.) Three-stage index generator.
N ≥ 6400 is the number of steps in each random walk. 19200 total for each dimension.
P3 is the linearized probability from random walk 3.
index = (nl^2) * Floor[nl * p1] + nl * Floor[nl * p2] + Floor[nl * p3]
The number of unique outputs in the index is nl^3= 1000 (0 to 999)
Indices with resolutions up to billions of lines can be generated using 10-12 stage generators with resolutions of 8-20 lines each. Since the generator nls are integers, it may be necessary to use an interpolation algorithm to get the exact resolution desired, designated target resolution. First design a generator that has a resolution just above the target resolution. Then use the following equation to produce the interpolation:
Interpolated index = Floor[target resolution * (index/resolution of index)]
The number of lines in the interpolated index is nl’ and it ranges from 0 to target resolution – 1.
Use interpolation only when necessary to achieve a specific resolution.
How to Tell if the Random Walk Index Generator is Working.
In highly technical algorithms, it is always necessary to have a strong test to help in the design process and to confirm the results are as expected. The Kolmorogov-Smirnov (KS) test is used to measure the properties of a large number of test indexes produced by the algorithm. The KS test is a powerful statistical test that can give the probabilities that the test sequence is drawn from a reference distribution. In this case, the test is made against the linear distribution. The output of the KS test is two probabilities, the first is the probability that the test data is above the theoretical perfect line at the place where their difference is greatest, designated KS+. The second is the probability the test data deviates the greatest amount below the perfect line, KS-. For a “good” statistical result, both these probabilities should be between 0.01 and 0.99, and the nominal probabilities are 0.5.
The Random Walk Index generator in this test design is:
nl = 32. (Number of Lines multiplier – must be a whole number.)
N ≥ 16384. is the number of steps in each random walk. 32768 total.
p1 is the linearized probability from random walk 1. Use a pseudorandom number [0, 1) for the reference test. Note, [0, 1) means the number goes from exactly 0 to just under 1 (1 less 1 LSB), the usual format for pseudorandom generators.
p2 is the linearized probability from random walk 2. Pseudorandom number for the reference.
index = nl * Floor[nl * p1] + Floor[nl * p2]
The number of unique outputs in the index is nl^2 = 1024 (0 to 1023)
Graphically, the cumulative distribution function of the linear distribution is just a line running from {0, 0} at the lower left to {1, 1} at the upper right of the plot. Figure 1 shows a reference plot using a pseudorandom generator to produce the two probabilities, p1 and p2. 100,000 index values are generated for this test.
KS+ and KS- were 0.822 and 0.174 respectively, and visually the line shows no obvious nonlinearities. This result shows two things: one, the algorithm does produce linearly distributed index values; and two, a test of 100,000 index values (a large sampling) is not significantly different from the test distribution, meaning the algorithm is effectively perfect at that level of testing.
A second test was performed the same way, but using linearized random walk terminal values for p1 and p2 to produce each index. The graphical result is shown in figure 2 below:
For a sampling of 100,000 values, KS+ and KS- were 0.99995 and 0.152. There is a slight “kink” noticeable in the middle of the plot. These results indicate there is a small imperfection in the random-walk-generated values of p1 and p2. When only 10,000 samples are used in the KS test, the results are clearly within the nominal range, but the kink remains. This kind of imperfection is to be expected since the conversion of random walk terminal values to linearized probabilities is limited by the number of steps in each walk. This imperfection is reduced as the number of steps is increased, but as a practical matter the number of steps is set by the MMI generator rate as well as processing overhead. For this example, the rate is 100Kbps and the MMI interaction time is chosen to be 0.2 to 0.4 seconds, giving a total of 20,000 to 40,000 bits. The total number of steps for this test was 32,768 or 0.328 seconds.
Increasing the number of steps by a factor of 4 to a total of 131,072 (1.3 seconds of MMI generator output) makes a notable difference in the graphical output, and the KS probabilities become, 0.994 and 0.516 – only slightly on the edge. A tradeoff is made between a slight statistical nonlinearity in index values and the excessive requirement for more bits in the random walks to reduce the nonlinearity. I believe the values chosen will work in most MMI applications, since the deviation is very slight. However, this method of index generation has never been tested, and algorithmic functioning is no guarantee how it will work in MMI systems.