Weighted Binary Index Generation from MMI Generator

ScottWilber · June 23, 2021, 9:28pm

This message describes another method of generating an index number or coordinate using MMI Generator bits. The algorithm assembles a binary word with the more significant bits receiving a higher weight than the less significant bits. Each bit is generated by performing a majority vote (MV) of an odd number of MMI bits.

The MV for this algorithm is 2 times the count of 1s used to generate each bit, less N, the total number of bits used for that bit. The value of each resulting bit is 1 if the MV is positive, else it is 0. This means simply, if there are more 1s in the sequence, a majority of 1s, the result of the MV is 1, otherwise there is a majority of 0s and the result is 0.

Each bit’s weight is determined by the number of MMI bits, N, used to produce that bit. The weight is proportional to the square root of N. For this version of the algorithm, N is doubled for each more significant bit in the generated index. Note, N is made odd to prevent ties in the MV. The is done by adding 1 bit to each N value.

Assume there are 32,768 bits available to generate the desired index value. About 0.33 seconds from a 100Kbps MMI generator. These bits will be used to generate a 10-bit index word in the range 0 to 1023.

MV0 through MV9 represent the results of the majority voting of the least significant bit (LSB) through the most significant bit (MSB). N1 through N9 are the number of bits used to produce the MV for MV0 through MV9. Choose N0 = 32 + 1 or 33 bits. Each increasing N value is two times the previous base value plus 1, so N2 = 65 and up through N9 = 16,385 bits. The total number of MMI bits used is 32,746. The full equation is:

index = MV0 + 2 * MV1 + 4 * MV2 + 8 * MV3 + 16 * MV4 + 32 * MV5 + 64 * MV6 + 128 * MV7 + 256 * MV9 + 512 * MV9

This is equal to a 10-bit integer shown with LSB on the left and MSB on the right (reverse order for many systems), though the multipliers would not be used if it is represented as a binary integer.

This index generator has pros and cons versus the random walk (RW) index generation method. The binary generator produces a theoretically perfectly linear result (see the results of the KS testing below), while the RW algorithm can never be perfectly linear. The binary approach is mathematically simpler, but it also has a clear limitation in the number of bits of resolution that are possible with a reasonable number of MMI bits available. The weight doubling for each more significant bit limits the number of bits to about 14, but the LSBs may use too few MMI bits to produce reliable MV results. Both types of index generator assume the MMI bits are unbiased or have insignificant biases. Actual MMI generator bias will add additional errors to the results of the index generation.

As with the RW index generator, the weighted binary index generator has not yet been tested in real MMI applications. The results of the KS testing confirm the expected linearity and functioning of the algorithm. For 100,000 trial indexes, KS+ was 0.835 and KS- was 0.340 – both in the nominal range. Consistent with the KS statistics, the graphical display of the cumulative distribution function of the test data shows no indication of nonlinearity:

JoshuaLengfelder · June 24, 2021, 7:10am

I would love to test this out in randonautica as soon as we have a development cycle to dedicated towards this algorithm. Obviously latency is a problem we have for applications and algorithms that require realtime. Would that problem apply here as well?

ScottWilber · June 24, 2021, 3:24pm

This is an algorithm for getting a linear index or location from MMI data. Any requirement for real-time generation/use in an MMI application would still remain. One possible benefit is this algorithm may require less computational overhead, which could help a little in reducing latency. Though if the MMI data is being processed efficiently in a fast CPU, this may not provide a meaningful advantage.

There are a number of techniques and a bit of art that allow providing a real-time system over the Internet or by other communication channels. Some of these I have mentioned in other threads, but finding an optimum solution depends on the specific application and how it is built. Indeed, having a goal of a real-time system will likely influence the basic design of the application.

D0ublezer0 · June 24, 2021, 3:33pm

I think 14-bit integer will not be enough to produce a geo-coordinate. We’re using 8 bytes to produce one. So we need at least 32-bit integer per axis.

The only way we can fit in 14 bits is using a coordinate system, limited inside the research area with 1m accuracy. Then 14 bits will be enough to cover the area with 8km radius.

Entropy consumption is still high tho. We will need something like 63Kb to produce one integer, that means 126Kb per point. And attractor search lib requirement for such area is something near 10000 points, it is more than 1Gb entropy.

ScottWilber · June 24, 2021, 5:24pm

1 Gb of MMI bits is not a realistic number for an online system, but that also depends on how and where the bits are processed. Even my model MED100Kx8 produces 1 Gbps internally. To be sure, those bits are processed within the chip where they are generated and intermediate results (bias-amplified bits) are transferred to the device it is connected to at 100 Kbps.

My thought would be to examine the whole geo-location algorithm to see how it could be made more efficient from an entropy consumption perspective. If the goal is to provide an application with maximum MMI response, a number of other factors must also be considered beyond algorithmic efficiency.

The multi-stage RW index generator can produce higher resolution with fewer MMI bits, so it may be a better fit for certain geo-location applications. However, one must look at the system as a whole to choose the best design given the available options.

Winternewt · June 24, 2021, 5:38pm

I think 14-bit integer will not be enough to produce a geo-coordinate.

We actually use 8 bytes (equivalent of 2 32-bit integers for x and y) to produce one with an error about 1 cm

I like this algorithm because it’s able to counter higher bits error margin and make output bits equal in terms of reliability. it’s definitely cheaper than RW but still a bit hungry for entropy input. I have an I idea of algorithm that serves the same purpose:

read the input bitstream with offset of k bits, where k = 0…N-1, where N=32 for 32-bit integer
As an output we have N arrays of integers (N-bit) correlated with each other, but the whole output possesses a property that each input bit contributes to result exactly N times, and it plays one time as MSB, one time as LSB and all in between. This does not eradicate error margin for higher bits per se (no miracle here, if we want less error we need more entropy, and the [Weighted Binary Index] it probably the best we can have) but it makes contribution of each bit into error equal.
We can address the correlation problem by hashing each of the integers. Observations with ANU and from MMI experiments indicate that hash-function doesn’t block the MMI effect but it somewhat guaranies that correlation between integer sets is removed.
Given that we set the ratio of input bits to output bits for hash as one (32 bytes go in, 32 out), we’ll still have N entropy bits from each hash, but multiplied N times (this is the point that concerns me the most, it sounds too good )
The downsides are:

auto-correlation, if we address it with hash, we may hinder MMI (questionable, but worth mentioning)
only one (also questionable, can MMI produce overlaping signals?) of the N inputs is truely the MMI output we’re looking for, other N-1 are noise. Here’s the point where we plot these on a canvas and geometrically compute where the point density is uneven.

This algo has the same entropy consumption as the non-weighted binary production, but it removes (or it seems to) the “bit-inequality” we’re struggling with. @ScottWilber I would appreciate any criticism or opinion on this as a whole and on hashing step that produces a semi-uniform distribution from non-uniform yet pertaining MMI-properties.

PS: I apologize for rarely participating in discussion directly but I follow the posts and D0ubleZer0 sometimes relays our common thoughts.

ScottWilber · June 24, 2021, 9:58pm

I would start my analysis by determining the actual resolution needed for the application. 32 bits represents a resolution of about 1cm over the circumference of the earth. Let’s just say a resolution of 10cm is reasonable for any game or application. As a reference, that’s around the width of an adult hand. Over 100Km there are 1 million of these 10cm blocks, so about 20 binary bits would seem to be enough for a ± 50Km area of interest. If 1000Km is desired to be resolved, the number of bits would be up to 24 (3 words in each dimension). These choices depend on the intended application, but let’s stay with the 20-bit resolution.

I would have to do a full simulation, but I think it is possible to make a multi-stage generator by combining weighted bit generators. It would be possible to make a 2D – 2^20 resolution generator in this way using only about 16384 bits, if I push everything.

To repeat, these index generators have never been tested in real MMI applications. It does seem at first glance one cannot depend on getting reliable MMI results with so many bits of information content with a single trial – it’s asking for a rather significant result with very little user mental effort.

With respect to your number 4, regardless of how the input entropy is transformed, each time it is reused, the amount of total entropy does not increase. I have seemingly been able to get more accurate results (higher effect size) by processing the input entropy by a particular algorithm. It essentially increases the possibility of revealing the “correct” answer already contained in the original entropy by looking at in many different ways. This perhaps sounds similar to your repeated hash function, but I was only trying to get a single result with the approach, versus extending the bits of information returned.

ScottWilber · June 25, 2021, 1:15am

I was indeed able to design and test a Multi-Stage Weighted Binary index generator that provided 2^20 unique index values (1,048,576) that is fully linear, both theoretically and by KS test of 100,000 test trials. (KS+ = 0.233, KS- = 0.886) The test design used about 8195 MMI bits (16392 for a 2D geo-coordinate).

I also tested a low MMI bit design using about 4172 bits (for 2D), to show it is still linear. (KS+ = 0.506, KS- = 0.884)

This test design is not necessarily optimal – either to minimize entropy requirement or to maximize MMI responsivity – but it clearly demonstrates the algorithm is successful.

This design can be used to make an index generator with very large resolution. A 3-stage generator of this type would produce 2^30 states (1.07 billion) and use about 24588 MMI bits.