A numeric
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
is said to be statistically random when it contains no recognizable
pattern
A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated l ...
s or regularities; sequences such as the results of an ideal
dice roll or the digits of
π exhibit statistical randomness.
Statistical randomness does not necessarily imply "true"
randomness
In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual rand ...
, i.e., objective
unpredictability
Predictability is the degree to which a correct prediction or forecast of a system's state can be made, either qualitatively or quantitatively.
Predictability and causality
Causal determinism has a strong relationship with predictability. Perfe ...
.
Pseudorandomness
A pseudorandom sequence of numbers is one that appears to be statistically random, despite having been produced by a completely deterministic and repeatable process.
Background
The generation of random numbers has many uses, such as for random ...
is sufficient for many uses, such as statistics, hence the name ''statistical'' randomness.
''Global randomness'' and ''local randomness'' are different. Most philosophical conceptions of randomness are global—because they are based on the idea that "in the long run" a sequence looks truly random, even if certain sub-sequences would ''not'' look random. In a "truly" random sequence of numbers of sufficient length, for example, it is probable there would be long sequences of nothing but repeating numbers, though on the whole the sequence might be random. ''Local'' randomness refers to the idea that there can be minimum sequence lengths in which random distributions are approximated. Long stretches of the same numbers, even those generated by "truly" random processes, would diminish the "local randomness" of a sample (it might only be locally random for sequences of 10,000 numbers; taking sequences of less than 1,000 might not appear random at all, for example).
A sequence exhibiting a pattern is not thereby proved not statistically random. According to principles of
Ramsey theory
Ramsey theory, named after the British mathematician and philosopher Frank P. Ramsey, is a branch of mathematics that focuses on the appearance of order in a substructure given a structure of a known size. Problems in Ramsey theory typically ask a ...
, sufficiently large objects must necessarily contain a given substructure ("complete disorder is impossible").
Legislation concerning
gambling
Gambling (also known as betting or gaming) is the wagering of something of value ("the stakes") on a random event with the intent of winning something else of value, where instances of strategy are discounted. Gambling thus requires three el ...
imposes certain standards of statistical randomness to
slot machine
A slot machine (American English), fruit machine (British English) or poker machine (Australian English and New Zealand English) is a gambling machine that creates a game of chance for its customers. Slot machines are also known pejoratively a ...
s.
Tests
The first tests for random numbers were published by
M.G. Kendall and
Bernard Babington Smith
Bernard Babington Smith, OBE (1905-1993) was a British academic, wartime intelligence officer and amateur athlete.
Early life and education
He was born on 26 October 1905 at 29 Hyde Park Gate, London, the son of Sir Henry Babington Smith and ...
in the ''
Journal of the Royal Statistical Society
The ''Journal of the Royal Statistical Society'' is a peer-reviewed scientific journal of statistics. It comprises three series and is published by Wiley for the Royal Statistical Society.
History
The Statistical Society of London was founded ...
'' in 1938. They were built on statistical tools such as
Pearson's chi-squared test
Pearson's chi-squared test (\chi^2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g., ...
that were developed to distinguish whether experimental phenomena matched their theoretical probabilities. Pearson developed his test originally by showing that a number of dice experiments by
W.F.R. Weldon
Walter Frank Raphael Weldon FRS (15 March 1860 – 13 April 1906), was an English evolutionary biologist and a founder of biometry. He was the joint founding editor of ''Biometrika'', with Francis Galton and Karl Pearson.
Family
Weldon was the ...
did not display "random" behavior.
Kendall and Smith's original four tests were
hypothesis tests, which took as their
null hypothesis
In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
the idea that each number in a given random sequence had an equal chance of occurring, and that various other patterns in the data should be also distributed equiprobably.
* The frequency test, was very basic: checking to make sure that there were roughly the same number of 0s, 1s, 2s, 3s, etc.
* The serial test, did the same thing but for sequences of two digits at a time (00, 01, 02, etc.), comparing their observed frequencies with their hypothetical predictions were they equally distributed.
* The poker test, tested for certain sequences of five numbers at a time (AAAAA, AAAAB, AAABB, etc.) based on hands in the game
poker
Poker is a family of comparing card games in which players wager over which hand is best according to that specific game's rules. It is played worldwide, however in some places the rules may vary. While the earliest known form of the game w ...
.
* The gap test, looked at the distances between zeroes (00 would be a distance of 0, 030 would be a distance of 1, 02250 would be a distance of 3, etc.).
If a given sequence was able to pass all of these tests within a given degree of significance (generally 5%), then it was judged to be, in their words "locally random". Kendall and Smith differentiated "local randomness" from "true randomness" in that many sequences generated with truly random ''methods'' might not display "local randomness" to a given degree — ''very'' large sequences might contain many rows of a single digit. This might be "random" on the scale of the entire sequence, but in a smaller block it would not be "random" (it would not pass their tests), and would be useless for a number of statistical applications.
As random number sets became more and more common, more tests, of increasing sophistication were used. Some modern tests plot random digits as points on a three-dimensional plane, which can then be rotated to look for hidden patterns. In 1995, the statistician
George Marsaglia
George Marsaglia (March 12, 1924 – February 15, 2011) was an American mathematician and computer scientist. He is best known for creating the diehard tests, a suite of software for measuring statistical randomness.
Research on random numbers ...
created a set of tests known as the
diehard tests The diehard tests are a battery of statistical tests for measuring the quality of a random number generator. They were developed by George Marsaglia over several years and first published in 1995 on a CD-ROM of random numbers.
Test overview
; Bi ...
, which he distributes with a
CD-ROM
A CD-ROM (, compact disc read-only memory) is a type of read-only memory consisting of a pre-pressed optical compact disc that contains data. Computers can read—but not write or erase—CD-ROMs. Some CDs, called enhanced CDs, hold both comput ...
of 5 billion
pseudorandom
A pseudorandom sequence of numbers is one that appears to be statistically random, despite having been produced by a completely deterministic
Determinism is a philosophical view, where all events are determined completely by previously exi ...
numbers. In 2015,
Yongge Wang Yongge Wang (born 1967) is a computer science professor at the University of North Carolina at Charlotte specialized in algorithmic complexity and cryptography. He is the inventor of IEEE P1363 cryptographic standards SRP5 and WANG-KE and has cont ...
distributed a Java software package for statistically distance based randomness testing.
Pseudorandom number generator
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generate ...
s require tests as exclusive verifications for their "randomness," as they are decidedly ''not'' produced by "truly random" processes, but rather by deterministic algorithms. Over the history of random number generation, many sources of numbers thought to appear "random" under testing have later been discovered to be very non-random when subjected to certain types of tests. The notion of
quasi-random In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of ''N'', its subsequence ''x''1, ..., ''x'N'' has a low discrepancy.
Roughly speaking, the discrepancy of a sequence is low if the proportion of po ...
numbers was developed to circumvent some of these problems, though pseudorandom number generators are still extensively used in many applications (even ones known to be extremely "non-random"), as they are "good enough" for most applications.
Other tests:
* The
Monobit test treats each output bit of the random number generator as a coin flip test, and determine if the observed number of heads and tails are close to the expected 50% frequency. The number of heads in a coin flip trail forms a
binomial distribution
In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...
.
* The
Wald–Wolfowitz runs test tests for the number of bit transitions between 0 bits, and 1 bits, comparing the observed frequencies with expected frequency of a random bit sequence.
*
Information entropy
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
*
Autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
test
*
Kolmogorov–Smirnov test
In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample wit ...
* Statistically distance based randomness test.
Yongge Wang Yongge Wang (born 1967) is a computer science professor at the University of North Carolina at Charlotte specialized in algorithmic complexity and cryptography. He is the inventor of IEEE P1363 cryptographic standards SRP5 and WANG-KE and has cont ...
showed that NIST SP800-22 testing standards are not sufficient to detect some weakness in randomness generators and proposed statistically distance based randomness test.
*
Spectral Density Estimation - performing a Fourier transform on a "random" signal transforms it into a sum of periodic functions in order to detect non random repetitive trends
* Maurer's Universal Statistical Test
* The
Diehard tests The diehard tests are a battery of statistical tests for measuring the quality of a random number generator. They were developed by George Marsaglia over several years and first published in 1995 on a CD-ROM of random numbers.
Test overview
; Bi ...
See also
*
Algorithmic randomness
Intuitively, an algorithmically random sequence (or random sequence) is a sequence of binary digits that appears random to any algorithm running on a (prefix-free or not) universal Turing machine. The notion can be applied analogously to sequenc ...
*
Check
Check or cheque, may refer to:
Places
* Check, Virginia
Arts, entertainment, and media
* ''Check'' (film), a 2021 Indian Telugu-language film
* ''The Checks'' (episode), a 1996 TV episode of ''Seinfeld''
Games and sports
* Check (chess), a thr ...
ing
*
Complete spatial randomness
Complete spatial randomness (CSR) describes a point process whereby point events occur within a given study area in a completely random fashion. It is synonymous with a ''homogeneous spatial Poisson process''.O. Maimon, L. Rokach, ''Data Mining and ...
*
Normal number
In mathematics, a real number is said to be simply normal in an integer base b if its infinite sequence of digits is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b. A number is said to b ...
*
One-time pad
In cryptography, the one-time pad (OTP) is an encryption technique that cannot be cracked, but requires the use of a single-use pre-shared key that is not smaller than the message being sent. In this technique, a plaintext is paired with a ran ...
*
Random error
Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a "mistake" ...
*
Randomness
In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual rand ...
*
Randomness tests
A randomness test (or test for randomness), in data evaluation, is a test used to analyze the distribution of a set of data to see if it can be described as random (patternless). In stochastic modeling, as in some computer simulations, the hoped-f ...
*
Statistical hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis.
Hypothesis testing allows us to make probabilistic statements about population parameters.
...
*
Seven states of randomness
The seven states of randomness in probability theory, fractals and risk analysis are extensions of the concept of randomness as modeled by the normal distribution. These seven states were first introduced by Benoît Mandelbrot in his 1997 boo ...
*
TestU01
TestU01 is a software library, implemented in the ANSI C language, that offers a collection of utilities for the empirical randomness testing of random number generators (RNGs).
References
External links
DieHarder A free (
GPL
The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
)
C Random Number Test Suite.
Generating Normal Distributed Random Numbers
{{DEFAULTSORT:Statistical Randomness