statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, completeness is a property of a

statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypot ...

computed on a sample dataset in relation to a parametric model of the dataset. It is opposed to the concept of an ancillary statistic. While an ancillary statistic contains no information about the model parameters, a complete statistic contains only information about the parameters, and no ancillary information. It is closely related to the concept of a

sufficient statistic In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It ...

which contains all of the information that the dataset provides about the parameters.

Definition

Consider a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

''X'' whose probability distribution belongs to a

parametric model In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters. Defi ...

''P''_''θ'' parametrized by ''θ''. Say ''T'' is a

; that is, the composition of a

measurable function In mathematics, and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in ...

with a random sample ''X''₁,...,''X''_n. The statistic ''T'' is said to be complete for the distribution of ''X'' if, for every measurable function ''g,'' :

\text\operatorname_\theta(g(T))=0\text\theta\text\mathbf_\theta(g(T)=0)=1\text\theta.

The statistic ''T'' is said to be boundedly complete for the distribution of ''X'' if this implication holds for every measurable function ''g'' that is also bounded.

Examples

Bernoulli model

The Bernoulli model admits a complete statistic. Let ''X'' be a

random sample In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the whole ...

of size ''n'' such that each ''X''_''i'' has the same

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...

with parameter ''p''. Let ''T'' be the number of 1s observed in the sample, i.e.

\textstyle T = \sum_^n X_i

. ''T'' is a statistic of ''X'' which has a

binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...

with parameters (''n'',''p''). If the parameter space for ''p'' is (0,1), then ''T'' is a complete statistic. To see this, note that :

\operatorname_p(g(T)) = \sum_^n  = (1-p)^n \sum_^n  .

Observe also that neither ''p'' nor 1 − ''p'' can be 0. Hence

E_p(g(T)) = 0

if and only if: :

\sum_^n g(t)\left(\frac\right)^t = 0.

On denoting ''p''/(1 − ''p'') by ''r'', one gets: :

\sum_^n g(t)r^t = 0 .

First, observe that the range of ''r'' is the

positive reals Positive is a property of positivity and may refer to: Mathematics and science * Positive formula, a logical formula not containing negation * Positive number, a number that is greater than 0 * Plus sign, the sign "+" used to indicate a posit ...

. Also, E(''g''(''T'')) is a

polynomial In mathematics, a polynomial is a Expression (mathematics), mathematical expression consisting of indeterminate (variable), indeterminates (also called variable (mathematics), variables) and coefficients, that involves only the operations of addit ...

in ''r'' and, therefore, can only be identical to 0 if all coefficients are 0, that is, ''g''(''t'') = 0 for all ''t''. It is important to notice that the result that all coefficients must be 0 was obtained because of the range of ''r''. Had the parameter space been finite and with a number of elements less than or equal to ''n'', it might be possible to solve the linear equations in ''g''(''t'') obtained by substituting the values of ''r'' and get solutions different from 0. For example, if ''n'' = 1 and the parameter space is , a single observation and a single parameter value, ''T'' is not complete. Observe that, with the definition: :

g(t) = 2(t-0.5), \,

then, E(''g''(''T'')) = 0 although ''g''(''t'') is not 0 for ''t'' = 0 nor for ''t'' = 1.

Gaussian model with fixed variance

This example will show that, in a sample ''X''₁, ''X''₂ of size 2 from a

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

with known variance, the statistic ''X''₁ + ''X''₂ is complete and sufficient. Suppose ''X''₁, ''X''₂ are

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

, identically distributed random variables,

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

with expectation ''θ'' and variance 1. The sum :

s((X_1, X_2)) = X_1 + X_2

is a complete statistic for ''θ''. To show this, it is sufficient to demonstrate that there is no non-zero function

g

such that the expectation of :

g(s(X_1, X_2)) = g(X_1+X_2)

remains zero regardless of the value of ''θ''. That fact may be seen as follows. The probability distribution of ''X''₁ + ''X''₂ is normal with expectation 2''θ'' and variance 2. Its probability density function in

x

is therefore proportional to :

\exp\left(-(x-2\theta)^2/4\right).

The expectation of ''g'' above would therefore be a constant times :

\int_^\infty g(x)\exp\left(-(x-2\theta)^2/4\right)\,dx.

A bit of algebra reduces this to :

k(\theta) \int_^\infty h(x)e^\,dx,

where ''k''(''θ'') is nowhere zero and :

h(x)=g(x)e^.

As a function of ''θ'' this is a two-sided

Laplace transform In mathematics, the Laplace transform, named after Pierre-Simon Laplace (), is an integral transform that converts a Function (mathematics), function of a Real number, real Variable (mathematics), variable (usually t, in the ''time domain'') to a f ...

of ''h'', and cannot be identically zero unless ''h'' is zero almost everywhere. The exponential is not zero, so this can only happen if ''g'' is zero almost everywhere. By contrast, the statistic

(X_1,X_2)

is sufficient but not complete. It admits a non-zero unbiased estimator of zero, namely

X_1-X_2

Sufficiency does not imply completeness

Most parametric models have a

which is not complete. This is important because the Lehmann–Scheffé theorem cannot be applied to such models. Galili and Meilijson 2016 propose the following didactic example. Consider

n

independent samples from the uniform distribution: :

X_i \sim U \big( (1-k) \theta , (1+k)\theta \big)
\qquad\qquad
0 < k < 1

k

is a known design parameter. This model is a ''scale family'' (a specific case of a location-scale family) model: scaling the samples by a multiplier

c

multiplies the parameter

\theta

. Galili and Meilijson show that the minimum and maximum of the samples are together a sufficient statistic:

X_, X_

(using the usual notation for

order statistics In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Important ...

). Indeed, conditional on these two values, the distribution of the rest of the sample is simply uniform on the range they define:

\left_, X_\right /math>.

However, their ratio has a distribution which does not depend on \theta . This follows from the fact that this is a scale family: any change of scale impacts both variables identically. Subtracting the mean m from that distribution, we obtain:

: \mathbb E \left \frac   \right - m = 0 We have thus shown that there exists a function g\left(X_, X_\right) which is not 0 everywhere but which has expectation 0 . The pair is thus not complete.

Importance of completeness

The notion of completeness has many applications in statistics, particularly in the following theorems of mathematical statistics.

Lehmann–Scheffé theorem

Completeness occurs in the Lehmann–Scheffé theorem, which states that if a statistic that is unbiased, complete and sufficient for some parameter ''θ'', then it is the best mean-unbiased estimator for ''θ''. In other words, this statistic has a smaller expected loss for any

convex Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytop ...

loss function; in many practical applications with the squared loss-function, it has a smaller mean squared error among any estimators with the same

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

. Examples exists that when the minimal sufficient statistic is not complete then several alternative statistics exist for unbiased estimation of ''θ'', while some of them have lower variance than others. See also

minimum-variance unbiased estimator In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. For pra ...

Basu's theorem

Bounded completeness occurs in Basu's theorem, which states that a statistic that is both boundedly complete and sufficient is

of any ancillary statistic.

Bahadur's theorem

Bounded completeness also occurs in Bahadur's theorem. In the case where there exists at least one minimal sufficient statistic, a statistic which is sufficient and boundedly complete, is necessarily minimal sufficient.

Notes

{{DEFAULTSORT:Completeness (Statistics) Statistical theory