In statistics, efficiency is a measure of quality of an

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

, of an

experimental design The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associ ...

, or of a

hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

procedure. Essentially, a more efficient estimator, needs fewer input data or observations than a less efficient one to achieve the

Cramér–Rao bound In estimation theory and statistics, the Cramér–Rao bound (CRB) expresses a lower bound on the variance of unbiased estimators of a deterministic (fixed, though unknown) parameter, the variance of any such estimator is at least as high as the ...

. An ''efficient estimator'' is characterized by having the smallest possible

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

, indicating that there is a small deviance between the estimated value and the "true" value in the

L2 norm In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is z ...

sense. The relative efficiency of two procedures is the ratio of their efficiencies, although often this concept is used where the comparison is made between a given procedure and a notional "best possible" procedure. The efficiencies and the relative efficiency of two procedures theoretically depend on the sample size available for the given procedure, but it is often possible to use the asymptotic relative efficiency (defined as the limit of the relative efficiencies as the sample size grows) as the principal comparison measure.

Estimators

The efficiency of an

unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group, ...

, ''T'', of a

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

''θ'' is defined as :

e(T)
=
\frac

where

\mathcal(\theta)

is the

Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...

of the sample. Thus ''e''(''T'') is the minimum possible variance for an unbiased estimator divided by its actual variance. The

can be used to prove that ''e''(''T'') ≤ 1.

Efficient estimators

An efficient estimator is an

that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors of different magnitudes. The most common choice of the loss function is quadratic, resulting in the

mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...

criterion of optimality. In general, the spread of an estimator around the parameter θ is a measure of estimator efficiency and performance. This performance can be calculated by finding the mean squared error. More formally, let ''T'' be an estimator for the parameter ''θ''. The mean squared error of ''T'' is the value

\operatorname(T)=E T-\theta)^2 /math>, which can be decomposed as a sum of its variance and bias:

: \begin
\operatorname(T) & = \operatorname E T-\theta)^2 \operatorname E T-\operatorname_E[T \operatorname_E[T.html" ;"title=".html" ;"title="T-\operatorname E[T">T-\operatorname E[T\operatorname E[T">.html" ;"title="T-\operatorname E[T">T-\operatorname E[T\operatorname E[T\theta)^2] \\[5pt]
& =\operatorname E[(T-\operatorname E[T])^2]+2E[T-E[T(\operatorname E \theta)+(\operatorname E \theta)^2 \\[5pt]
& =\operatorname(T)+(\operatorname E \theta)^2
\end An estimator ''T''

₁ performs better than an estimator ''T''₂ if

\operatorname(T_1) < \operatorname(T_2)

. For a more specific case, if ''T''₁ and ''T₂ ''are two unbiased estimators for the same parameter θ, then the variance can be compared to determine performance. In this case, ''T''₂ is ''more efficient'' than ''T''₁ if the variance of ''T''₂ is ''smaller'' than the variance of ''T''₁, i.e.

\operatorname(T_1)>\operatorname(T_2)

for all values of ''θ''. This relationship can be determined by simplifying the more general case above for mean squared error; since the expected value of an unbiased estimator is equal to the parameter value,

\theta

. Therefore, for an unbiased estimator,

\operatorname(T)=\operatorname(T)

, as the

\theta)^2

term drops out for being equal to 0. If an

of a parameter ''θ'' attains

e(T) = 1

for all values of the parameter, then the estimator is called efficient. Equivalently, the estimator achieves equality in the Cramér–Rao inequality for all ''θ''. The Cramér–Rao lower bound is a lower bound of the variance of an unbiased estimator, representing the "best" an unbiased estimator can be. An efficient estimator is also the

minimum variance unbiased estimator In statistics a minimum-variance unbiased estimator (MVUE) or uniformly minimum-variance unbiased estimator (UMVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. For pr ...

(MVUE). This is because an efficient estimator maintains equality on the Cramér–Rao inequality for all parameter values, which means it attains the minimum variance for all parameters (the definition of the MVUE). The MVUE estimator, even if it exists, is not necessarily efficient, because "minimum" does not mean equality holds on the Cramér–Rao inequality. Thus an efficient estimator need not exist, but if it does, it is the MVUE.

Finite-sample efficiency

Suppose is a parametric model and are the data sampled from this model. Let be an

for the parameter ''θ''. If this estimator is

(that is, ), then the Cramér–Rao inequality states the

of this estimator is bounded from below: :

\geq\ \mathcal_\theta^,

where

\scriptstyle\mathcal_\theta

is the

Fisher information matrix In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...

of the model at point ''θ''. Generally, the variance measures the degree of dispersion of a random variable around its mean. Thus estimators with small variances are more concentrated, they estimate the parameters more precisely. We say that the estimator is a finite-sample efficient estimator (in the class of unbiased estimators) if it reaches the lower bound in the Cramér–Rao inequality above, for all . Efficient estimators are always minimum variance unbiased estimators. However the converse is false: There exist point-estimation problems for which the minimum-variance mean-unbiased estimator is inefficient. Historically, finite-sample efficiency was an early optimality criterion. However this criterion has some limitations: * Finite-sample efficient estimators are extremely rare. In fact, it was proved that efficient estimation is possible only in an

exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...

, and only for the natural parameters of that family. * This notion of efficiency is sometimes restricted to the class of

estimators. (Often it isn't.) Since there are no good theoretical reasons to require that estimators are unbiased, this restriction is inconvenient. In fact, if we use

as a selection criterion, many biased estimators will slightly outperform the “best” unbiased ones. For example, in

multivariate statistics Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the dif ...

for dimension three or more, the mean-unbiased estimator,

sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger popu ...

, is inadmissible: Regardless of the outcome, its performance is worse than for example the

James–Stein estimator The James–Stein estimator is a biased estimator of the mean, \boldsymbol\theta, of (possibly) correlated Gaussian distributed random vectors Y = \ with unknown means \. It arose sequentially in two main published papers, the earlier version ...

. * Finite-sample efficiency is based on the variance, as a criterion according to which the estimators are judged. A more general approach is to use loss functions other than quadratic ones, in which case the finite-sample efficiency can no longer be formulated. As an example, among the models encountered in practice, efficient estimators exist for: the mean ''μ'' of the

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

(but not the variance ''σ''²), parameter ''λ'' of the

Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...

, the probability ''p'' in the binomial or

multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of wh ...

. Consider the model of a

with unknown mean but known variance: The data consists of ''n''

independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...

observations from this model: . We estimate the parameter ''θ'' using the

of all observations: :

T(X) = \frac1n \sum_^n x_i\ .

This estimator has mean ''θ'' and variance of , which is equal to the reciprocal of the

from the sample. Thus, the sample mean is a finite-sample efficient estimator for the mean of the normal distribution.

Asymptotic efficiency

Asymptotic efficiency requires Consistency (statistics), asymptotic normally distribution of estimator, and asymptotic variance-covariance matrix no worse than any other estimator.

Example: Median

Consider a sample of size

N

drawn from a

of mean

\mu

and unit

, i.e.,

X_n \sim \mathcal(\mu, 1).

The

\overline

, of the sample

X_1, X_2, \ldots, X_N

, defined as :

\overline = \frac \sum_^ X_n \sim \mathcal\left(\mu, \frac\right).

The variance of the mean, 1/''N'' (the square of the

standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error o ...

) is equal to the reciprocal of the

from the sample and thus, by the Cramér–Rao inequality, the sample mean is efficient in the sense that its efficiency is unity (100%). Now consider the sample median,

\widetilde

. This is an

and

consistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...

estimator for

\mu

. For large

N

the sample median is approximately normally distributed with mean

\mu

and variance

/,

\widetilde \sim \mathcal \left(\mu, \frac \pi \right).

The efficiency of the median for large

N

is thus :

e\left(\widetilde\right) = \left(\frac 1 N\right) \left(\frac \pi  \right)^ = 2/\pi \approx 0.64.

In other words, the relative variance of the median will be

\pi/2 \approx 1.57

, or 57% greater than the variance of the mean – the standard error of the median will be 25% greater than that of the mean. Note that this is the asymptotic efficiency — that is, the efficiency in the limit as sample size

N

tends to infinity. For finite values of

N,

the efficiency is higher than this (for example, a sample size of 3 gives an efficiency of about 74%). The sample mean is thus more efficient than the sample median in this example. However, there may be measures by which the median performs better. For example, the median is far more robust to outliers, so that if the Gaussian model is questionable or approximate, there may advantages to using the median (see

Robust statistics Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, su ...

Dominant estimators

T_1

and

T_2

are estimators for the parameter

\theta

, then

T_1

is said to

dominate The Dominate, also known as the late Roman Empire, is the name sometimes given to the " despotic" later phase of imperial government in the ancient Roman Empire. It followed the earlier period known as the "Principate". Until the empire was reuni ...

T_2

if: # its

(MSE) is smaller for at least some value of

\theta

# the MSE does not exceed that of

T_2

for any value of θ. Formally,

T_1

dominates

T_2

if :

\operatorname (T_1 - \theta)^2 \leq
\operatorname (T_2-\theta)^2

holds for all

\theta

, with strict inequality holding somewhere.

Relative efficiency

The relative efficiency of two unbiased estimators is defined as :

e(T_1,T_2) = \frac
 
 
= \frac

Although

e

is in general a function of

\theta

, in many cases the dependence drops out; if this is so,

e

being greater than one would indicate that

T_1

is preferable, regardless of the true value of

\theta

. An alternative to relative efficiency for comparing estimators, is the

Pitman closeness criterion In statistical theory, the Pitman closeness criterion, named after E. J. G. Pitman, is a way of comparing two candidate estimators for the same parameter. Under this criterion, estimator A is preferred to estimator B if the probability that estima ...

. This replaces the comparison of mean-squared-errors with comparing how often one estimator produces estimates closer to the true value than another estimator. If

T_1

and

T_2

are estimators for the parameter

\theta

, then

T_1

is said to

T_2

if: # its

(MSE) is smaller for at least some value of

\theta

# the MSE does not exceed that of

T_2

for any value of θ. Formally,

T_1

dominates

T_2

if :

\mathrm
\left (T_1 - \theta)^2
\right \leq
\mathrm
\left (T_2-\theta)^2
\right

holds for all

\theta

, with strict inequality holding somewhere.

Estimators of the mean of u.i.d. variables

In estimating the mean of uncorrelated, identically distributed variables we can take advantage of the fact that the variance of the sum is the sum of the variances. In this case efficiency can be defined as the square of the coefficient of variation, i.e., :

e \equiv \left(\frac \right)^2

Relative efficiency of two such estimators can thus be interpreted as the relative sample size of one required to achieve the certainty of the other. Proof: :

\frac = \frac.

Now because

s_1^2 = n_1 \sigma^2, \, s_2^2 = n_2 \sigma^2

we have

\frac = \frac

, so the relative efficiency expresses the relative sample size of the first estimator needed to match the variance of the second.

Robustness

Efficiency of an estimator may change significantly if the distribution changes, often dropping. This is one of the motivations of

robust statistics Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, su ...

– an estimator such as the sample mean is an efficient estimator of the population mean of a normal distribution, for example, but can be an inefficient estimator of a

mixture distribution In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collectio ...

of two normal distributions with the same mean and different variances. For example, if a distribution is a combination of 98% ''N''(''μ,'' ''σ'') and 2% ''N''(''μ,'' 10''σ''), the presence of extreme values from the latter distribution (often "contaminating outliers") significantly reduces the efficiency of the sample mean as an estimator of ''μ.'' By contrast, the

trimmed mean A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end ...

is less efficient for a normal distribution, but is more robust (i.e., less affected) by changes in the distribution, and thus may be more efficient for a mixture distribution. Similarly, the shape of a distribution, such as

skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...

or heavy tails, can significantly reduce the efficiency of estimators that assume a symmetric distribution or thin tails.

Uses of inefficient estimators

While efficiency is a desirable quality of an estimator, it must be weighed against other considerations, and an estimator that is efficient for certain distributions may well be inefficient for other distributions. Most significantly, estimators that are efficient for clean data from a simple distribution, such as the normal distribution (which is symmetric, unimodal, and has thin tails) may not be robust to contamination by outliers, and may be inefficient for more complicated distributions. In

, more importance is placed on robustness and applicability to a wide variety of distributions, rather than efficiency on a single distribution.

M-estimator In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estim ...

s are a general class of solutions motivated by these concerns, yielding both robustness and high relative efficiency, though possibly lower efficiency than traditional estimators for some cases. These are potentially very computationally complicated, however. A more traditional alternative are

L-estimator In statistics, an L-estimator is an estimator which is a linear combination of order statistics of the measurements (which is also called an L-statistic). This can be as little as a single point, as in the median (of an odd number of values), or a ...

s, which are very simple statistics that are easy to compute and interpret, in many cases robust, and often sufficiently efficient for initial estimates. See applications of L-estimators for further discussion.

Efficiency in statistics

Efficiency in statistics is important because they allow one to compare the performance of various estimators. Although an unbiased estimator is usually favored over a biased one, a more efficient biased estimator can sometimes be more valuable than a less efficient unbiased estimator. For example, this can occur when the values of the biased estimator gathers around a number closer to the true value. Thus, estimator performance can be predicted easily by comparing their mean squared errors or variances.

Hypothesis tests

For comparing

significance test A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

s, a meaningful measure of efficiency can be defined based on the sample size required for the test to achieve a given task

power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...

Pitman efficiency Pitman may refer to: * A coal miner, particularly in Northern England * Pitman (surname) * Pitman, New Jersey, United States * Pitman, Pennsylvania, United States * Pitman, Saskatchewan, Canada * Pitman Shorthand, a system of shorthand * Pi ...

and Bahadur efficiency (or Hodges–Lehmann efficiency) relate to the comparison of the performance of statistical hypothesis testing procedures. The Encyclopedia of Mathematics provides
brief exposition
of these three criteria.

Experimental design

For experimental designs, efficiency relates to the ability of a design to achieve the objective of the study with minimal expenditure of resources such as time and money. In simple cases, the relative efficiency of designs can be expressed as the ratio of the sample sizes required to achieve a given objective.

Notes

References

* *