random - Computing correlations and performing goodness of fit for prng testing

Question

I'm following along with a description of a test for pseudo-random number generators, and attempting to implement the test in C. There's one thing I'm hung up on though. The text in question is as follows:

Applies a correlation test on the Hamming weights of successive blocks of L bits. Let Xj be the Hamming weight (the numbers of bits equal to 1) of the jth block, for j = 1, . . . , n. The test computes the empirical correlation between the successive Xj’s,

Under H0, as n ⇢ infinity, p̂ * sqrt(n - 1) has asymptotically the standard normal distribution. This is what is used in the test. The test is valid only for large n.

Now, my plan is to compute this test statistic and perform a goodness of fit test to the normal distribution using the Anderson–Darling test. However, I'm a bit confused as to how you get a distribution from this single test statistic. From my understanding, for my full set of bits n, I'll get just one p̂. So then I'll get just one test statistic p̂ * sqrt(n - 1). How am I supposed to compare this to the normal distribution? Would the idea be to break up my dataset into multiple chunks with their own n, compute a test statistic for each, and then compare this distribution to the standard normal? I just want to make sure I'm understanding the calculation of p̂ correctly.

score 0 · Accepted Answer

Frequentist hypothesis testing involves determining the likelihood of observing a test statistic value under the assumption the null hypothesis is true. If the test statistic value is highly likely, the null hypothesis is not rejected. If the test statistic value is 'not very likely', the null hypothesis is rejected. The meaning of 'not very likely' is specified as the confidence level of the test, α.

According to your text, under the null hypothesis T = p̂ * sqrt(n - 1) is asymptotically distributed as a standard normal distribution, T ~ N(0, 1). So to conduct a test under the two hypotheses:

Null: T = 0
Alternate: T <> 0

then with your single observed p̂ value:

Compute t = p̂ * sqrt(n - 1).
Compute p = P(|T| > |t|), i.e. find the tail probabilities for N(0, 1) at the value |t|.
If p is less than your confidence level, reject the null hypothesis in favor of the alternate hypothesis.

As an example, suppose you generated a sequence of n=10001 random numbers and based on the sequence you computed a p̂ value of 0.025. To determine the significance of that value at an α = 0.05 significance level:

Compute t = p̂ * sqrt(n - 1) = 0.025 * sqrt(10001 - 1) = 2.5
Compute p = P(|T| > |t|) =P(|T| > 2.5) = 0.01242
Since p < α, evidence supports rejecting the null hypothesis.

score 0 · Accepted Answer

IF you want to perform perform a goodness of fit test to the normal distribution, it means you have to have many sampled gaussian values. So if p̂ * sqrt(n - 1) is asymptotically N(0,1), then single test run will produce single number. So if you have software based RNG to test, you continue with another n samples and get another random N(0,1) number etc. If you're already got N numbers from, say, some hardware device, you have to split it in chunks, run test, from each chunk you'll get one number supposedly from N(0,1), and you run distribution test.

Paper: Beware of Linear Congruential Generators with Multipliers of the Form a = +-2q +-2r PIERRE L’ECUYER and RICHARD SIMARD, AACM, 1999

I have a copy if you need it

random - Computing correlations and performing goodness of fit for prng testing

2 回答 2

Related

Reference