In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, completeness is a property of a
statistic
A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypot ...
computed on a
sample dataset in relation to a parametric model of the dataset. It is opposed to the concept of an
ancillary statistic. While an ancillary statistic contains no information about the model parameters, a complete statistic contains only information about the parameters, and no ancillary information. It is closely related to the concept of a
sufficient statistic
In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It ...
which contains all of the information that the dataset provides about the parameters.
[
]
Definition
Consider a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
''X'' whose probability distribution belongs to a
parametric model
In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters.
Defi ...
''P''
''θ'' parametrized by ''θ''.
Say ''T'' is a
statistic
A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypot ...
; that is, the composition of a
measurable function
In mathematics, and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in ...
with a random sample ''X''
1,...,''X''
n.
The statistic ''T'' is said to be complete for the distribution of ''X'' if, for every measurable function ''g,''
:
The statistic ''T'' is said to be boundedly complete for the distribution of ''X'' if this implication holds for every measurable function ''g'' that is also bounded.
Examples
Bernoulli model
The Bernoulli model admits a complete statistic.
Let ''X'' be a
random sample
In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the whole ...
of size ''n'' such that each ''X''
''i'' has the same
Bernoulli distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...
with parameter ''p''. Let ''T'' be the number of 1s observed in the sample, i.e.
. ''T'' is a statistic of ''X'' which has a
binomial distribution
In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
with parameters (''n'',''p''). If the parameter space for ''p'' is (0,1), then ''T'' is a complete statistic. To see this, note that
:
Observe also that neither ''p'' nor 1 − ''p'' can be 0. Hence
if and only if:
:
On denoting ''p''/(1 − ''p'') by ''r'', one gets:
:
First, observe that the range of ''r'' is the
positive reals
Positive is a property of positivity and may refer to:
Mathematics and science
* Positive formula, a logical formula not containing negation
* Positive number, a number that is greater than 0
* Plus sign, the sign "+" used to indicate a posit ...
. Also, E(''g''(''T'')) is a
polynomial
In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...
in ''r'' and, therefore, can only be identical to 0 if all coefficients are 0, that is, ''g''(''t'') = 0 for all ''t''.
It is important to notice that the result that all coefficients must be 0 was obtained because of the range of ''r''. Had the parameter space been finite and with a number of elements less than or equal to ''n'', it might be possible to solve the linear equations in ''g''(''t'') obtained by substituting the values of ''r'' and get solutions different from 0. For example, if ''n'' = 1 and the parameter space is , a single observation and a single parameter value, ''T'' is not complete. Observe that, with the definition:
:
then, E(''g''(''T'')) = 0 although ''g''(''t'') is not 0 for ''t'' = 0 nor for ''t'' = 1.
Gaussian model with fixed variance
This example will show that, in a sample ''X''
1, ''X''
2 of size 2 from a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
with known variance, the statistic ''X''
1 + ''X''
2 is complete and sufficient. Suppose ''X''
1, ''X''
2 are
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in Pennsylvania, United States
* Independentes (English: Independents), a Portuguese artist ...
, identically distributed random variables,
normally distributed
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is
f(x ...
with expectation ''θ'' and variance 1.
The sum
:
is a complete statistic for ''θ''.
To show this, it is sufficient to demonstrate that there is no non-zero function
such that the expectation of
:
remains zero regardless of the value of ''θ''.
That fact may be seen as follows. The probability distribution of ''X''
1 + ''X''
2 is normal with expectation 2''θ'' and variance 2. Its probability density function in
is therefore proportional to
:
The expectation of ''g'' above would therefore be a constant times
:
A bit of algebra reduces this to
:
where ''k''(''θ'') is nowhere zero and
:
As a function of ''θ'' this is a two-sided
Laplace transform
In mathematics, the Laplace transform, named after Pierre-Simon Laplace (), is an integral transform that converts a Function (mathematics), function of a Real number, real Variable (mathematics), variable (usually t, in the ''time domain'') to a f ...
of ''h'', and cannot be identically zero unless ''h'' is zero almost everywhere.
The exponential is not zero, so this can only happen if ''g'' is zero almost everywhere.
By contrast, the statistic
is sufficient but not complete. It admits a non-zero unbiased estimator of zero, namely
.
Sufficiency does not imply completeness
Most parametric models have a
sufficient statistic
In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It ...
which is not complete. This is important because the
Lehmann–Scheffé theorem cannot be applied to such models. Galili and Meilijson 2016
propose the following didactic example.
Consider
independent samples from the uniform distribution:
:
is a known design parameter. This model is a ''scale family'' (a specific case of
a location-scale family) model: scaling the samples by a multiplier
multiplies the parameter
.
Galili and Meilijson show that the minimum and maximum of the samples are together a sufficient statistic:
(using the usual notation for
order statistics
In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference.
Important ...
). Indeed, conditional on these two values, the distribution of the rest of the sample is simply uniform on the range they define: