U-statistic
   HOME

TheInfoList



OR:

In
statistical theory The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical ...
, a U-statistic is a class of statistics that is especially important in
estimation theory Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their valu ...
; the letter "U" stands for unbiased. In elementary statistics, U-statistics arise naturally in producing
minimum-variance unbiased estimator In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. For pra ...
s. The theory of U-statistics allows a
minimum-variance unbiased estimator In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. For pra ...
to be derived from each
unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In stat ...
of an ''estimable parameter'' (alternatively, ''statistical
functional Functional may refer to: * Movements in architecture: ** Functionalism (architecture) ** Form follows function * Functional group, combination of atoms within molecules * Medical conditions without currently visible organic basis: ** Functional sy ...
'') for large classes of probability distributions. An estimable parameter is a
measurable function In mathematics and in particular measure theory, a measurable function is a function between the underlying sets of two measurable spaces that preserves the structure of the spaces: the preimage of any measurable set is measurable. This is in di ...
of the population's cumulative probability distribution: For example, for every probability distribution, the population median is an estimable parameter. The theory of U-statistics applies to general classes of probability distributions.


History

Many statistics originally derived for particular parametric families have been recognized as U-statistics for general distributions. In
non-parametric statistics Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distr ...
, the theory of U-statistics is used to establish for statistical procedures (such as estimators and tests) and estimators relating to the
asymptotic normality In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing a ...
and to the variance (in finite samples) of such quantities. The theory has been used to study more general statistics as well as
stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables. Stochastic processes are widely used as mathematical models of systems and phenomena that appea ...
es, such as
random graph In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs li ...
s. Suppose that a problem involves
independent and identically-distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
and that estimation of a certain parameter is required. Suppose that a simple unbiased estimate can be constructed based on only a few observations: this defines the basic estimator based on a given number of observations. For example, a single observation is itself an unbiased estimate of the mean and a pair of observations can be used to derive an unbiased estimate of the variance. The U-statistic based on this estimator is defined as the average (across all combinatorial selections of the given size from the full set of observations) of the basic estimator applied to the sub-samples. Sen (1992) provides a review of the paper by
Wassily Hoeffding Wassily Hoeffding (June 12, 1914 – February 28, 1991) was a Finnish statistician and probabilist. Hoeffding was one of the founders of nonparametric statistics, in which Hoeffding contributed the idea and basic results on U-statistics. In pro ...
(1948), which introduced U-statistics and set out the theory relating to them, and in doing so Sen outlines the importance U-statistics have in statistical theory. Sen says, “The impact of Hoeffding (1948) is overwhelming at the present time and is very likely to continue in the years to come.” Note that the theory of U-statistics is not limited to the case of
independent and identically-distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
or to scalar random-variables.Borovskikh's last chapter discusses U-statistics for exchangeable
random element In probability theory, random element is a generalization of the concept of random variable to more complicated spaces than the simple real line. The concept was introduced by who commented that the “development of probability theory and expansio ...
s taking values in a
vector space In mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called ''vectors'', may be added together and multiplied ("scaled") by numbers called '' scalars''. Scalars are often real numbers, but can ...
( separable
Banach space In mathematics, more specifically in functional analysis, a Banach space (pronounced ) is a complete normed vector space. Thus, a Banach space is a vector space with a metric that allows the computation of vector length and distance between vector ...
).


Definition

The term U-statistic, due to Hoeffding (1948), is defined as follows. Let K be either the real or complex numbers, and let f\colon (K^d)^r\to K be a K-valued function of r d-dimensional variables. For each n\ge r the associated U-statistic f_n\colon (K^d)^n \to K is defined to be the average of the values f(x_, \dotsc, x_) over the set I_ of r-tuples of indices from \ with distinct entries. Formally, :f_n(x_1,\dotsc, x_n) = \frac \sum_ f(x_,\dotsc, x_). In particular, if f is symmetric the above is simplified to :f_n(x_1, \dotsc, x_n) = \frac \sum_ f(x_, \dotsc, x_), where now J_ denotes the subset of I_ of ''increasing'' tuples. Each U-statistic f_n is necessarily a
symmetric function In mathematics, a function of n variables is symmetric if its value is the same no matter the order of its arguments. For example, a function f\left(x_1,x_2\right) of two arguments is a symmetric function if and only if f\left(x_1,x_2\right) = f\l ...
. U-statistics are very natural in statistical work, particularly in Hoeffding's context of
independent and identically distributed random variables In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...
, or more generally for exchangeable sequences, such as in
simple random sampling In statistics, a simple random sample (or SRS) is a subset of individuals (a sample (statistics), sample) chosen from a larger Set (mathematics), set (a statistical population, population) in which a subset of individuals are chosen randomization, ...
from a finite population, where the defining property is termed ‘inheritance on the average’. Fisher's ''k''-statistics and Tukey's
polykay In statistics, a polykay, or generalised k-statistic, (denoted k_) is a statistic defined as a linear combination of sample moments. Etymology The word ''polykay'' was coined by American mathematician John Tukey John Wilder Tukey (; June 16, ...
s are examples of
homogeneous polynomial In mathematics, a homogeneous polynomial, sometimes called quantic in older texts, is a polynomial whose nonzero terms all have the same degree. For example, x^5 + 2 x^3 y^2 + 9 x y^4 is a homogeneous polynomial of degree 5, in two variables; t ...
U-statistics (Fisher, 1929; Tukey, 1950). For a simple random sample ''φ'' of size ''n'' taken from a population of size ''N'', the U-statistic has the property that the average over sample values ''ƒ''''n''(''xφ'') is exactly equal to the population value ''ƒ''''N''(''x'').


Examples

Some examples: If f(x) = x the U-statistic f_n(x) = \bar x_n = (x_1 + \cdots + x_n)/n is the sample mean. If f(x_1, x_2) = , x_1 - x_2, , the U-statistic is the mean pairwise deviation f_n(x_1,\ldots, x_n) = 2 / (n(n-1))\sum_ , x_i - x_j, , defined for n\ge 2. If f(x_1, x_2) = (x_1 - x_2)^2/2, the U-statistic is the
sample variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...
f_n(x) = \sum(x_i - \bar x_n)^2/(n-1) with divisor n-1, defined for n\ge 2. The third k-statistic k_(x) = \sum(x_i - \bar x_n)^3 n/((n-1)(n-2)), the sample
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal d ...
defined for n\ge 3, is a U-statistic. The following case highlights an important point. If f(x_1, x_2, x_3) is the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
of three values, f_n(x_1,\ldots, x_n) is not the median of n values. However, it is a minimum variance unbiased estimate of the expected value of the median of three values, not the median of the population. Similar estimates play a central role where the parameters of a family of probability distributions are being estimated by probability weighted moments or
L-moments In statistics, L-moments are a sequence of statistics used to summarize the shape of a probability distribution. They are linear combinations of order statistics ( L-statistics) analogous to conventional moments, and can be used to calculate qu ...
.


See also

*
V-statistic V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics (U for " unbiased") introduced by Wassily Hoeffd ...


Notes


References

* * Cox, D. R., Hinkley, D. V. (1974) ''Theoretical statistics''. Chapman and Hall. * Fisher, R. A. (1929) Moments and product moments of sampling distributions. ''Proceedings of the London Mathematical Society'', 2, 30:199–238. * Hoeffding, W. (1948) A class of statistics with asymptotically normal distributions.
Annals of Statistics The ''Annals of Statistics'' is a peer-reviewed statistics journal published by the Institute of Mathematical Statistics. It was started in 1973 as a continuation in part of the '' Annals of Mathematical Statistics (1930)'', which was split into th ...
, 19:293–325. (Partially reprinted in: Kotz, S., Johnson, N. L. (1992) ''Breakthroughs in Statistics'', Vol I, pp 308–334. Springer-Verlag. ) * * Lee, A. J. (1990) ''U-Statistics: Theory and Practice''. Marcel Dekker, New York. pp320 * Sen, P. K. (1992) Introduction to Hoeffding (1948) A Class of Statistics with Asymptotically Normal Distribution. In: Kotz, S., Johnson, N. L. ''Breakthroughs in Statistics'', Vol I, pp 299–307. Springer-Verlag. . * * * {{DEFAULTSORT:U-Statistic Estimation theory Nonparametric statistics Asymptotic theory (statistics) U-statistics