statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

, completeness is a property of a

statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypo ...

in relation to a model for a set of observed data. In essence, it ensures that the distributions corresponding to different values of the parameters are distinct. It is closely related to the idea of

identifiability In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an ...

, but in

statistical theory The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical ...

it is often found as a condition imposed on a

sufficient statistic In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...

from which certain optimality results are derived.

Definition

Consider a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

''X'' whose probability distribution belongs to a

parametric model In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters. Def ...

''P''_''θ'' parametrized by ''θ''. Say ''T'' is a

; that is, the composition of a

measurable function In mathematics and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in di ...

with a random sample ''X''₁,...,''X''_n. The statistic ''T'' is said to be complete for the distribution of ''X'' if, for every measurable function ''g,'':

\text\operatorname_\theta(g(T))=0\text\theta\text\mathbf_\theta(g(T)=0)=1\text\theta.

The statistic ''T'' is said to be boundedly complete for the distribution of ''X'' if this implication holds for every measurable function ''g'' that is also bounded.

Example 1: Bernoulli model

The Bernoulli model admits a complete statistic. Let ''X'' be a

random sample In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt ...

of size ''n'' such that each ''X''_''i'' has the same

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabil ...

with parameter ''p''. Let ''T'' be the number of 1s observed in the sample, i.e.

\textstyle T = \sum_^n X_i

. ''T'' is a statistic of ''X'' which has a

binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...

with parameters (''n'',''p''). If the parameter space for ''p'' is (0,1), then ''T'' is a complete statistic. To see this, note that :

\operatorname_p(g(T)) = \sum_^n  = (1-p)^n \sum_^n  .

Observe also that neither ''p'' nor 1 − ''p'' can be 0. Hence

E_p(g(T)) = 0

if and only if: :

\sum_^n g(t)\left(\frac\right)^t = 0.

On denoting ''p''/(1 − ''p'') by ''r'', one gets: :

\sum_^n g(t)r^t = 0 .

First, observe that the range of ''r'' is the

positive reals In mathematics, the set of positive real numbers, \R_ = \left\, is the subset of those real numbers that are greater than zero. The non-negative real numbers, \R_ = \left\, also include zero. Although the symbols \R_ and \R^ are ambiguously used fo ...

. Also, E(''g''(''T'')) is a

polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An exa ...

in ''r'' and, therefore, can only be identical to 0 if all coefficients are 0, that is, ''g''(''t'') = 0 for all ''t''. It is important to notice that the result that all coefficients must be 0 was obtained because of the range of ''r''. Had the parameter space been finite and with a number of elements less than or equal to ''n'', it might be possible to solve the linear equations in ''g''(''t'') obtained by substituting the values of ''r'' and get solutions different from 0. For example, if ''n'' = 1 and the parameter space is , a single observation and a single parameter value, ''T'' is not complete. Observe that, with the definition: :

g(t) = 2(t-0.5), \,

then, E(''g''(''T'')) = 0 although ''g''(''t'') is not 0 for ''t'' = 0 nor for ''t'' = 1.

Relation to sufficient statistics

For some parametric families, a complete

does not exist (for example, see Galili and Meilijson 2016 ). For example, if you take a sample sized ''n'' > 2 from a ''N''(θ,θ²) distribution, then

\left(\sum_^n X_i, \sum_^n X_i^2\right)

is a minimal sufficient statistic and is a function of any other minimal sufficient statistic, but

2\left(\sum_^n X_i\right)^2-(n+1)\sum_^nX_i^2

has an expectation of 0 for all θ, so there cannot be a complete statistic. If there is a minimal sufficient statistic then any complete sufficient statistic is also minimal sufficient. But there are pathological cases where a minimal sufficient statistic does not exist even if a complete statistic does.

Importance of completeness

The notion of completeness has many applications in statistics, particularly in the following two theorems of mathematical statistics.

Lehmann–Scheffé theorem

Completeness occurs in the

Lehmann–Scheffé theorem In statistics, the Lehmann–Scheffé theorem is a prominent statement, tying together the ideas of completeness, sufficiency, uniqueness, and best unbiased estimation. The theorem states that any estimator which is unbiased for a given unknown qu ...

, which states that if a statistic that is unbiased, complete and

sufficient In logic and mathematics, necessity and sufficiency are terms used to describe a conditional or implicational relationship between two statements. For example, in the conditional statement: "If then ", is necessary for , because the truth of ...

for some parameter ''θ'', then it is the best mean-unbiased estimator for ''θ''. In other words, this statistic has a smaller expected loss for any

convex Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytope ...

loss function; in many practical applications with the squared loss-function, it has a smaller mean squared error among any estimators with the same

expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...

. Examples exists that when the minimal sufficient statistic is not complete then several alternative statistics exist for unbiased estimation of ''θ'', while some of them have lower variance than others. See also

minimum-variance unbiased estimator In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. For pra ...

Basu's theorem

Bounded completeness occurs in Basu's theorem, which states that a statistic that is both boundedly complete and

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...

of any

ancillary statistic An ancillary statistic is a measure of a sample whose distribution (or whose pmf or pdf) does not depend on the parameters of the model. An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to ...

Bahadur's theorem

Bounded completeness also occurs in Bahadur's theorem. In the case where there exists at least one minimal sufficient statistic, a statistic which is

and boundedly complete, is necessarily minimal sufficient. Another form of Bahadur's theorem states that any sufficient and boundedly complete statistic over a finite-dimensional coordinate space is also minimal sufficient.

Notes

References

* * * * * {{DEFAULTSORT:Completeness (Statistics) Statistical theory