In
theoretical computer science
Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory.
It is difficult to circumsc ...
, a small-bias sample space (also known as
-biased sample space,
-biased generator, or small-bias probability space) is a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
that fools
parity function In Boolean algebra, a parity function is a Boolean function whose value is one if and only if the input vector has an odd number of ones. The parity function of two inputs is also known as the XOR function.
The parity function is notable for its ...
s.
In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to
pseudorandom generator
In theoretical computer science and cryptography, a pseudorandom generator (PRG) for a class of statistical tests is a deterministic procedure that maps a random seed to a longer pseudorandom string such that no statistical test in the class c ...
s for parity functions.
The main useful property of small-bias sample spaces is that they need far fewer truly random bits than the uniform distribution to fool parities. Efficient constructions of small-bias sample spaces have found many applications in computer science, some of which are
derandomization
A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performan ...
,
error-correcting code
In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea i ...
s, and
probabilistically checkable proofs
In computational complexity theory, a probabilistically checkable proof (PCP) is a type of proof that can be checked by a randomized algorithm using a bounded amount of randomness and reading a bounded number of bits of the proof. The algorithm i ...
.
The connection with
error-correcting code
In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea i ...
s is in fact very strong since
-biased sample spaces are ''equivalent'' to
-balanced error-correcting codes.
Definition
Bias
Let
be a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
over
.
The ''bias'' of
with respect to a set of indices
is defined as
:
where the sum is taken over
, the
finite field
In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field that contains a finite number of elements. As with any field, a finite field is a set on which the operations of multiplication, addition, subt ...
with two elements. In other words, the sum
equals
if the number of ones in the sample
at the positions defined by
is even, and otherwise, the sum equals
.
For
, the empty sum is defined to be zero, and hence
.
ϵ-biased sample space
A probability distribution
over
is called an ''
-biased sample space'' if
holds for all non-empty subsets
.
ϵ-biased set
An
-biased sample space
that is generated by picking a uniform element from a
multiset
In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that ...
is called ''
-biased set''.
The ''size''
of an
-biased set
is the size of the multiset that generates the sample space.
ϵ-biased generator
An
-biased generator
is a function that maps strings of length
to strings of length
such that the multiset
is an
-biased set. The ''seed length'' of the generator is the number
and is related to the size of the
-biased set
via the equation
.
Connection with epsilon-balanced error-correcting codes
There is a close connection between
-biased sets and ''
-balanced''
linear error-correcting codes.
A linear code
of
message length and
block length
In coding theory, block codes are a large and important family of error-correcting codes that encode data in blocks.
There is a vast number of examples for block codes, many of which have a wide range of practical applications. The abstract defin ...
is
''
-balanced'' if the
Hamming weight
The Hamming weight of a string is the number of symbols that are different from the zero-symbol of the alphabet used. It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string o ...
of every nonzero codeword
is between
and
.
Since
is a linear code, its
generator matrix
In coding theory, a generator matrix is a matrix whose rows form a basis for a linear code. The codewords are all of the linear combinations of the rows of this matrix, that is, the linear code is the row space of its generator matrix.
Terminol ...
is an
-matrix
over
with
.
Then it holds that a multiset
is
-biased if and only if the linear code
, whose columns are exactly elements of
, is
-balanced.
[cf., e.g., p. 2 of ]
Constructions of small epsilon-biased sets
Usually the goal is to find
-biased sets that have a small size
relative to the parameters
and
.
This is because a smaller size
means that the amount of randomness needed to pick a random element from the set is smaller, and so the set can be used to fool parities using few random bits.
Theoretical bounds
The probabilistic method gives a non-explicit construction that achieves size
.
The construction is non-explicit in the sense that finding the
-biased set requires a lot of true randomness, which does not help towards the goal of reducing the overall randomness.
However, this non-explicit construction is useful because it shows that these efficient codes exist.
On the other hand, the best known lower bound for the size of
-biased sets is
, that is, in order for a set to be
-biased, it must be at least that big.
Explicit constructions
There are many explicit, i.e., deterministic constructions of
-biased sets with various parameter settings:
* achieve
. The construction makes use of
Justesen code
In coding theory, Justesen codes form a class of error-correcting codes that have a constant rate, constant relative distance, and a constant alphabet size.
Before the Justesen error correction code was discovered, no error correction code was kn ...
s (which is a concatenation of
Reed–Solomon codes with the
Wozencraft ensemble) as well as
expander walk sampling.
* achieve
. One of their constructions is the concatenation of
Reed–Solomon codes with the
Hadamard code
The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mar ...
; this concatenation turns out to be an
-balanced code, which gives rise to an
-biased sample space via the connection mentioned above.
* Concatenating
Algebraic geometric codes with the
Hadamard code
The Hadamard code is an error-correcting code named after Jacques Hadamard that is used for error detection and correction when transmitting messages over very noisy or unreliable channels. In 1971, the code was used to transmit photos of Mar ...
gives an
-balanced code with
.
* achieves
.
* achieves
which is almost optimal because of the lower bound.
These bounds are mutually incomparable. In particular, none of these constructions yields the smallest
-biased sets for all settings of
and
.
Application: almost k-wise independence
An important application of small-bias sets lies in the construction of almost k-wise independent sample spaces.
k-wise independent spaces
A random variable
over
is a ''k-wise independent space'' if, for all index sets
of size
, the
marginal distribution
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables ...
is exactly equal to the
uniform distribution over
.
That is, for all such
and all strings
, the distribution
satisfies
.
Constructions and bounds
k-wise independent spaces are fairly well understood.
* A simple construction by achieves size
.
* construct a k-wise independent space whose size is
.
* prove that no k-wise independent space can be significantly smaller than
.
Joffe's construction
constructs a
-wise independent space
over the
finite field
In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field that contains a finite number of elements. As with any field, a finite field is a set on which the operations of multiplication, addition, subt ...
with some prime number
of elements, i.e.,
is a distribution over
. The initial
marginals of the distribution are drawn independently and uniformly at random:
:
.
For each
with
, the marginal distribution of
is then defined as
:
where the calculation is done in
.
proves that the distribution
constructed in this way is
-wise independent as a distribution over
.
The distribution
is uniform on its support, and hence, the support of
forms a ''
-wise independent set''.
It contains all
strings in
that have been extended to strings of length
using the deterministic rule above.
Almost k-wise independent spaces
A random variable
over
is a ''
-almost k-wise independent space'' if, for all index sets
of size
, the restricted distribution
and the uniform distribution
on
are
-close in
1-norm, i.e.,
.
Constructions
give a general framework for combining small k-wise independent spaces with small
-biased spaces to obtain
-almost k-wise independent spaces of even smaller size.
In particular, let
be a
linear mapping
In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pr ...
that generates a k-wise independent space and let
be a generator of an
-biased set over
.
That is, when given a uniformly random input, the output of
is a k-wise independent space, and the output of
is
-biased.
Then
with
is a generator of an
-almost
-wise independent space, where
.
[Section 4 in ]
As mentioned above, construct a generator
with
, and construct a generator
with
.
Hence, the concatenation
of
and
has seed length
.
In order for
to yield a
-almost k-wise independent space, we need to set
, which leads to a seed length of
and a sample space of total size
.
Notes
References
*
*
*
*
*
*
*
* {{Citation
, last1 = Ta-Shma
, first1 = Amnon
, title = Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing
, chapter = Explicit, almost optimal, epsilon-balanced codes
, date = 2017
, year = 2017
, pages=238–251
, doi=10.1145/3055399.3055408
, isbn = 9781450345286, s2cid = 5648543
Pseudorandomness
Theoretical computer science