Multi-Stage Index Generation from Random Walks

ScottWilber · June 18, 2021, 8:57pm

An index can be generated using random walks produced from the output of an MMI generator. The index may be used for finding a location in one or multiple dimensions. A duplicate index generator is used for each dimension, though all the numbers may be drawn from a single MMI generator. To find a particular line in a list, use a 1D generator. For geographic locations, use a 2D generator. I suggest drawing numbers from the MMI generator approximately at the same time by alternately using small block of data to increment each random walk total until all the totals are complete. This is meant to keep the MMI interaction time the same for each index generator stage and each dimension.

The use of multi-stage generators makes it possible to achieve very high resolution without having to use what would otherwise require a random walk with an enormous number of input bits or steps, N. To achieve good linearity of output of a single-stage random walk index with a number of unique lines, nl, takes about (2 nl)^2 to (8 nl)^2 steps. Note, the up arrow, ^, indicates a power – in this case, the value in the parentheses squared. For 1000 possible lines the number of steps would be 4 to 16 million steps. This number of bits is too large for a practical MMI system. A two-stage 1D index generator with 32 internal lines would require about 32768 total steps, half in each stage. A three-stage generator with 10 internal lines in each stage would require a total of only 19200 bits.

Achieving best MMI responsivity (highest effect size) requires optimizing the parameters in the index generator. An optimum mental interaction time is assumed to be in the range of 0.2 to 0.4 seconds. Using an MMI generator with 100Kbps generation rate would provide 20,000 to 40,000 bits to be used in the index generator. The parameters chosen depend on the number of dimensions and the resolution (number of unique index lines).

In general, choose the simplest design and use the largest number of steps available for each random walk. Testing indicates the number of bits required is highest at the lowest resolution and decreases toward the highest resolution. The number of lines is roughly in the range of 10 to 100, but these are not absolute limitations. For 10-20 lines, use N ≥ (8 nl)^2; for 21-48 lines, use N ≥ (4 nl)^2 bits for 49-100 lines, use N ≥ (2 nl)^2 bits. These are intended as general guidelines, and specific systems should be tested thoroughly to verify performance. Note, it is always strongly recommended that N be an odd number to prevent ties in the random walk. When the terminal position (number of steps away from the starting point) is 0, the result is somewhat indeterminate for some applications. Requiring N to be an odd number prevents that from happening.

The following list includes a number of examples to illustrate the RW index generator design. index is the output for each call of the index generator. The output is always a whole number or integer, depending on how it is cast in the language being used. The N values shown are the minimum, not necessarily the optimum as defined above. The p values are calculated by taking the random walk terminal values, dividing by the standard deviation (SD = square root of N) to produce z – scores, and then converting the z – scores to p values (evaluate the normal cumulative distribution function at z or use the approximation curve fit). Note, the z – score is also (2 * total number of 1s in the RW data – N)/square root of N

nl = 10. (Number of Lines multiplier – must be a whole number.)
N ≥ 6400. is the number of steps in each random walk. 12800 total for each dimension.
p1 is the linearized probability from random walk 1.
p2 is the linearized probability from random walk 2.
index = nl * Floor[nl * p1] + Floor[nl * p2]
The number of unique outputs in the index is nl^2 = 100 (0 to 99)

nl = 20. (Number of Lines multiplier – must be a whole number.)
N ≥ 25600. is the number of steps in each random walk. 51200 total for each dimension.
index = nl * Floor[nl * p1] + Floor[nl * p2]
The number of unique outputs in the index is nl^2 = 400 (0 to 399)

nl = 100. (Number of Lines multiplier – must be a whole number.)
N ≥ 40000. is the number of steps in each random walk. 80000 total for each dimension.
index = nl * Floor[nl * p1] + Floor[nl * p2]
The number of unique outputs in the index is nl^2 = 10000 (0 to 9999)

nl = 10. (Number of Lines multiplier – must be a whole number.) Three-stage index generator.
N ≥ 6400 is the number of steps in each random walk. 19200 total for each dimension.
P3 is the linearized probability from random walk 3.
index = (nl^2) * Floor[nl * p1] + nl * Floor[nl * p2] + Floor[nl * p3]
The number of unique outputs in the index is nl^3= 1000 (0 to 999)

Indices with resolutions up to billions of lines can be generated using 10-12 stage generators with resolutions of 8-20 lines each. Since the generator nls are integers, it may be necessary to use an interpolation algorithm to get the exact resolution desired, designated target resolution. First design a generator that has a resolution just above the target resolution. Then use the following equation to produce the interpolation:

Interpolated index = Floor[target resolution * (index/resolution of index)]
The number of lines in the interpolated index is nl’ and it ranges from 0 to target resolution – 1.
Use interpolation only when necessary to achieve a specific resolution.

How to Tell if the Random Walk Index Generator is Working.

In highly technical algorithms, it is always necessary to have a strong test to help in the design process and to confirm the results are as expected. The Kolmorogov-Smirnov (KS) test is used to measure the properties of a large number of test indexes produced by the algorithm. The KS test is a powerful statistical test that can give the probabilities that the test sequence is drawn from a reference distribution. In this case, the test is made against the linear distribution. The output of the KS test is two probabilities, the first is the probability that the test data is above the theoretical perfect line at the place where their difference is greatest, designated KS+. The second is the probability the test data deviates the greatest amount below the perfect line, KS-. For a “good” statistical result, both these probabilities should be between 0.01 and 0.99, and the nominal probabilities are 0.5.

The Random Walk Index generator in this test design is:
nl = 32. (Number of Lines multiplier – must be a whole number.)
N ≥ 16384. is the number of steps in each random walk. 32768 total.
p1 is the linearized probability from random walk 1. Use a pseudorandom number [0, 1) for the reference test. Note, [0, 1) means the number goes from exactly 0 to just under 1 (1 less 1 LSB), the usual format for pseudorandom generators.
p2 is the linearized probability from random walk 2. Pseudorandom number for the reference.
index = nl * Floor[nl * p1] + Floor[nl * p2]
The number of unique outputs in the index is nl^2 = 1024 (0 to 1023)

Graphically, the cumulative distribution function of the linear distribution is just a line running from {0, 0} at the lower left to {1, 1} at the upper right of the plot. Figure 1 shows a reference plot using a pseudorandom generator to produce the two probabilities, p1 and p2. 100,000 index values are generated for this test.

KS+ and KS- were 0.822 and 0.174 respectively, and visually the line shows no obvious nonlinearities. This result shows two things: one, the algorithm does produce linearly distributed index values; and two, a test of 100,000 index values (a large sampling) is not significantly different from the test distribution, meaning the algorithm is effectively perfect at that level of testing.

A second test was performed the same way, but using linearized random walk terminal values for p1 and p2 to produce each index. The graphical result is shown in figure 2 below:

For a sampling of 100,000 values, KS+ and KS- were 0.99995 and 0.152. There is a slight “kink” noticeable in the middle of the plot. These results indicate there is a small imperfection in the random-walk-generated values of p1 and p2. When only 10,000 samples are used in the KS test, the results are clearly within the nominal range, but the kink remains. This kind of imperfection is to be expected since the conversion of random walk terminal values to linearized probabilities is limited by the number of steps in each walk. This imperfection is reduced as the number of steps is increased, but as a practical matter the number of steps is set by the MMI generator rate as well as processing overhead. For this example, the rate is 100Kbps and the MMI interaction time is chosen to be 0.2 to 0.4 seconds, giving a total of 20,000 to 40,000 bits. The total number of steps for this test was 32,768 or 0.328 seconds.

Increasing the number of steps by a factor of 4 to a total of 131,072 (1.3 seconds of MMI generator output) makes a notable difference in the graphical output, and the KS probabilities become, 0.994 and 0.516 – only slightly on the edge. A tradeoff is made between a slight statistical nonlinearity in index values and the excessive requirement for more bits in the random walks to reduce the nonlinearity. I believe the values chosen will work in most MMI applications, since the deviation is very slight. However, this method of index generation has never been tested, and algorithmic functioning is no guarantee how it will work in MMI systems.

WanderingIshiki · June 20, 2021, 3:10am

Thank you! I’ll go over this.

WanderingIshiki · June 27, 2021, 1:47am

Would you suggest actual n-size blocks for each stage’s entropy or would bit-by-bit presumably work the same?

ScottWilber · June 27, 2021, 3:49pm

I haven’t done the extensive testing that would be required to give a certain answer, but some logic and experience can be a guide. When adding MMI bits in blocks, I suggest using at least 1 word at a time to minimize overhead. Bit-wise handling is the most time consuming, while words can be used with a lookup table to return a number of bits to increment a counter. I think the important point is the amount of time involved during a cycle of adding bits. For a 10-stage RW index generator and a 100Kbps MMI generator, word-wise data accumulation takes 0.8ms to complete one 10-stage update, while bit-wise accumulations would take 0.1ms per update. These two times are not significantly different compared to a 200ms trial period or to how fast a user’s mental focus or intention might change. Even an 8 byte per stage update would only take 6.4ms to complete an update cycle – likely still okay.

I should note, if it turns out to be equal or even better (measured by closeness of hits or effect size) to fully update each stage in sequence, I expect it would be better to update the most significant stage first and progress to the least significant stage. The most significant stage is the one with the highest multiplier (nl^max). That would mean the first and effectively the largest or “coarsest” update would be made closest to the initiation of a trial, presumably at the peak of the user’s intention. This would be true for either the Multi-Stage generator or the Weighted Binary method.

Simurgh · July 20, 2021, 6:31pm

I still don’t think, that it’s good idea without index pre-processing. I think, vector space embedding will work fine.
I believe, that MMI is effect, which connects soul and brain. And brain pretty much does information processing. So, probably, good MMI interface will resemble brain to some degree at least in terms of information processing.

ScottWilber · July 20, 2021, 8:32pm

Preprocessing the space over which MMI is supposed to select a result does seem important so that close hits are actually close. Otherwise, an index value that is off by 1 can be random or completely unrelated to the exact hit. Since MMI is not “exact,” but probabilistic, out best hope is to get a close hit, especially when the mental effort in small.

I always assumed the best way to design an MMI system, either the hardware or hardware and software/firmware, is to follow as closely as possible the functionality of the brain. This is based on the belief that the brain/mind has been highly optimized by evolution for MMI-like functions, which increase survival advantages. This design can only be taken so far because we don’t yet understand the exact way quantum mechanics comes into the picture.

As we gain more experience with MMI, it becomes more apparent that principles of quantum mechanics are involved. I believe there is a superposition of possible outcomes that is measured by an observation, that is, seeing the result manifest. The probability of a certain outcome – the intended one – is made more probable by the act of visualizing/intending it. The quantum mechanical connection of visualization or intending that allows the superposition to be affected can arise from a shared entangled system. That is something that presumably exists in the brain/mind that connects the information of the visualization and the observed outcome. This is the most vague part of the theory at this point. There are some suggestions how entanglement can exist for long periods in the brain, but we still know so very little about the possibilities.

Simurgh · July 20, 2021, 9:32pm

MMI more likely tell us, that brain isn’t developed to hold entangled states, but macroscopic entangled states exist in world. Brain only learns how to interact with them. Otherwise, we wouldn’t be able to influence REGs.