Problem Description
This program eventually tests our Random Number Generators (actually Delphi's
for 32-bit and a local implementation for 64-bit numbers) to determine if the
numbers generated are at least uniformly distributed across a range. . There are
many algorithms to test randomness; this
program explores the Chi-Squared statistic which measures the deviation of
random samples from the expected distribution. There are three pages presented
here presented in the order I wrote them.
Background & Techniques
Chi-Squared from "Numerical Recipes in Pascal"
The first experiment replicates the Demo program which came with Chi-Square 1
procedure, ChsOne, (single random variable vs. one of known
distribution). The demo program was implemented to see that this
Chi-Squared procedure was successfully converted from old "Numerical Recipes for
Pascal" code to use "Extended" variable type and to run under Delphi as
ChsOneExt. As yet I have no idea what it does or whether the
results match although "Numerical Recipes in Pascal" book is on order so more
information may be available soon. I did note that the test "fudges"
degrees of freedom to match the number of bins (categories) rather than (bins -
1) which is normally correct. More about this and the "Numerical
Recipes" code later.
Donald Knuth's Random Number Testing
The second experiment in the program is based on Chapter 3 of Donald Knuth's
classic work "The Art of Computer Programming" which dedicates 170 pages to
generating and testing random numbers. Virtually all software Random Number
Generators (RNGs) are called "Pseudo" generators because after the first
number, each value returned is based on the last value returned. Since there are
only a finite number of values, eventually a returned number will repeat and a
new identical cycle begins.
This suggests that one good test would be to determine if the RNG generates all
values within its range of values, but I have not found or figured how to
conduct such a test. Almost all tests I've seen are statistical sampling tests.
The Chi-Squared is the first of many found in in Knuth's chapter. I've
duplicated it here to validate my implementation of the method.
The "Type 1" Chi-Squared test compares an observed distribution of
experimental results for a single variable with a theoretical distribution.
Knuth threw 2 standard dice 144 times (says it got too boring to do more), and
compares the number of occurrences of each sum to the theoretical count of sums
(from 2 (rolling two 1's) to 12 (rolling two 6's). For 144 throws the
theoretical distribution of the 11 possible sums from 2 through 12 are (4, 8,
12, 16, 20, 24, 20, 16, 12, 8, 4) and the observed counts for those sums was (2,
4, 10, 12, 22, 29, 21, 15, 4, 9, 6). The Chi-Squared statistic is
calculated as the sum of the quotient of the squared differences of each
(observed count - expected count) divided by the expected count. In this case
Chi-Squared = (2-4)2/4 + (4 -8)2/8 + (19-12)2/12
+ .... + (6-4)2/4 = 7.1458.
The significance of this number is based on the degrees of freedom (number of
sum bins -1). Knuth has a lookup table for several degrees of freedom values
which allow estimating the probability that the differences from drawing a
random sample from the theoretical distribution will be greater than or equal to
the observed differences. Knuth argues convincingly that for a single test we
should expect the probabilities to fall in the 0.25 to 0.75 range. If the value
is high there very little chance that the true distribution would produce a
Chi-Sq value this high.. On the other hand if all values are close to expected,
Chi-Squared is low and our trial does reflect the true distribution of sums, but
but it's highly unlikely than a random sample would bee this close to expected
counts across the board. Such a result in single trial makes it
likely that the "books were cooked" (e.g. the dice thrower could control the sum
thrown and deliberately tried to match the theoretical distribution) and this is
not a true random sample. Knuth says that probability results
between .25 and .75 (Chi-Sq between 6.7 and 12.6) which should occur half
the time and indicates that our dice results are probably random.
I call this a weak test for confirming randomness because Chi-Sq values would
have to be < 3.9 or > 18.5 for us to be 95% sure that the dice were loaded after
144 throws. Knuth's .25 to .75 probability range will occur only half the time
even if the observed sample is from a random distribution. In my test,
we'll take advantage of the computer to generate many more samples and multiple
trials for our Random Number Generators.
My Random Number Tests
This experiment was the motivation for entire program while looking for a way
to validate the "randomness" of the 64 bit random number generator recently
(November 2013) implemented in our Mathslib unit. On-line searches for ways to
verify the quality of an RNG led me to Knuth which led me to the Chi-Squared
test.
The Chi-Squared test is not a very strong indicator of randomness For a given
set of numbers placed into bins based on their value, the statistic is applied
to differences between the observed frequency and the expected frequency for the
assumed distribution of the
population. Random numbers have a theoretical uniform distribution so each bin
should have the same value. One test is to count the number of occurrences of
each of the digits 0 through 9. when generating random numbe in that range.
Another partitioning is implemented with a "Quantile" button which divides
the total range of random numbers for each length nto a specified number of
bins.
Any individual trial can produce a wide range of Chi-Squared values but
Chi-Squared results should be centered about the 50% critical value. Averaging
many trials should produce a mean Chi-Squared value near this 50% critical
value. I implemented a "Number of Trials" field to test this and in fact it
seems that running 1,000 trials with 1,000 samples per trial ("only" 1 million
samples in total) usually produces results in the 49% to 51% range. - good
evidence that our random numbers for both 32 and 64-bit integers are at least
uniformly distributed. Good enough for now.
Delphi 7 Programmers Note:
Be aware that the Random64 procedure when compiled under Delphi 7 or
earlier uses the Random procure from our Big Integers unit with numbers
limited to Int64 size. This runs much slower than the Delphi XE
version so running millions of samples may take minutes rather than seconds.
Running/Exploring the Program