HOME

TheInfoList



OR:

A truncated mean or trimmed mean is a
statistical Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
measure of central tendency, much like the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...
and
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
. It involves the calculation of the mean after discarding given parts of a
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
or
sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of s ...
at the high and low end, and typically discarding an equal amount of both. This number of points to be discarded is usually given as a percentage of the total number of points, but may also be given as a fixed number of points. For most statistical applications, 5 to 25 percent of the ends are discarded. For example, given a set of 8 points, trimming by 12.5% would discard the minimum and maximum value in the sample: the smallest and largest values, and would compute the mean of the remaining 6 points. The 25% trimmed mean (when the lowest 25% and the highest 25% are discarded) is known as the
interquartile mean The interquartile mean (IQM) (or midmean) is a statistical measure of central tendency based on the truncated mean of the interquartile range. The IQM is very similar to the scoring method used in sports that are evaluated by a panel of judges: ''d ...
. The median can be regarded as a fully truncated mean and is most robust. As with other
trimmed estimator In statistics, a trimmed estimator is an estimator derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered o ...
s, the main advantage of the trimmed mean is robustness and higher
efficiency Efficiency is the often measurable ability to avoid wasting materials, energy, efforts, money, and time in doing something or in producing a desired result. In a more general sense, it is the ability to do things well, successfully, and without ...
for mixed distributions and heavy-tailed distribution (like the Cauchy distribution), at the cost of lower efficiency for some other less heavily tailed distributions (such as the normal distribution). For intermediate distributions the differences between the efficiency of the mean and the median are not very big, e.g. for the student-t distribution with 2 degrees of freedom the variances for mean and median are nearly equal.


Terminology

In some regions of
Central Europe Central Europe is an area of Europe between Western Europe and Eastern Europe, based on a common historical, social and cultural identity. The Thirty Years' War (1618–1648) between Catholicism and Protestantism significantly shaped the area' ...
it is also known as a Windsor mean, but this name should not be confused with the
Winsorized mean A winsorized mean is a winsorized statistical measure of central tendency, much like the mean and median, and even more similar to the truncated mean. It involves the calculation of the mean after winsorizing -- replacing given parts of a pro ...
: in the latter, the observations that the trimmed mean would discard are instead replaced by the largest/smallest of the remaining values. Discarding only the maximum and minimum is known as the , particularly in management statistics. This is also known as the (for example in US agriculture, like the
Average Crop Revenue Election The U.S. Farm Service Agency administers Annual Crop Revenue Election (ACRE, a backronym), a new program authorized by the 2008 Farm Bill that begins in crop year 2009. Through ACRE, the United States Department of Agriculture (USDA) offers producer ...
), due to its use in Olympic events, such as the
ISU Judging System The ISU Judging System (or the International Judging System (IJS)), occasionally referred to as the Code of Points (COP) system, is the scoring system that has been used since 2004 to judge the figure skating disciplines of men's and ladies' sin ...
in
figure skating Figure skating is a sport in which individuals, pairs, or groups perform on figure skates on ice. It was the first winter sport to be included in the Olympic Games, when contested at the 1908 Olympics in London. The Olympic disciplines are me ...
, to make the score robust to a single outlier judge.


Interpolation

When the percentage of points to discard does not yield a whole number, the trimmed mean may be defined by interpolation, generally linear interpolation, between the nearest whole numbers. For example, if you need to calculate the 15% trimmed mean of a sample containing 10 entries, strictly this would mean discarding 1 point from each end (equivalent to the 10% trimmed mean). If interpolating, one would instead compute the 10% trimmed mean (discarding 1 point from each end) and the 20% trimmed mean (discarding 2 points from each end), and then interpolating, in this case averaging these two values. Similarly, if interpolating the 12% trimmed mean, one would take the weighted average: weight the 10% trimmed mean by 0.8 and the 20% trimmed mean by 0.2.


Advantages

The truncated mean is a useful estimator because it is less sensitive to
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s than the mean but will still give a reasonable estimate of central tendency or mean for many statistical models. In this regard it is referred to as a
robust estimator Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, su ...
. For example, in its use in Olympic judging, truncating the maximum and minimum prevents a single judge from increasing or lowering the overall score by giving an exceptionally high or low score. One situation in which it can be advantageous to use a truncated mean is when estimating the location parameter of a Cauchy distribution, a bell shaped probability distribution with (much) fatter tails than a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
. It can be shown that the truncated mean of the middle 24% sample
order statistics In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Importa ...
(i.e., truncate the sample by 38% at each end) produces an estimate for the population location parameter that is more efficient than using either the sample median or the full sample mean. However, due to the fat tails of the Cauchy distribution, the efficiency of the estimator decreases as more of the sample gets used in the estimate. Note that for the Cauchy distribution, neither the truncated mean, full sample mean or sample median represents a
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
estimator, nor are any as asymptotically efficient as the maximum likelihood estimator; however, the maximum likelihood estimate is more difficult to compute, leaving the truncated mean as a useful alternative.


Drawbacks

The truncated mean uses more information from the distribution or
sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of s ...
than the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
, but unless the underlying distribution is symmetric, the truncated mean of a sample is unlikely to produce an unbiased estimator for either the mean or the median.


Statistical tests

It is possible to perform a
Student's t-test A ''t''-test is any statistical hypothesis test in which the test statistic follows a Student's ''t''-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of ...
based on the truncated mean, which is called Yuen's t-test, which also has several implementations in R.


Examples

The scoring method used in many
sport Sport pertains to any form of Competition, competitive physical activity or game that aims to use, maintain, or improve physical ability and Skill, skills while providing enjoyment to participants and, in some cases, entertainment to specta ...
s that are evaluated by a panel of judges is a truncated mean: ''discard the lowest and the highest scores; calculate the mean value of the remaining scores''. The
Libor The London Inter-Bank Offered Rate is an interest-rate average calculated from estimates submitted by the leading banks in London. Each bank estimates what it would be charged were it to borrow from other banks. The resulting average rate is u ...
benchmark interest rate is calculated as a trimmed mean: given 18 responses, the top 4 and bottom 4 are discarded, and the remaining 10 are averaged (yielding trim factor of 4/18 ≈ 22%). Consider the data set consisting of: : (N = 20, mean = 101.5) The 5th percentile (−6.75) lies between −40 and −5, while the 95th percentile (148.6) lies between 101 and 1053 (values shown in bold). Then, a 5% trimmed mean would result in the following: : (N = 18, mean = 56.5) This example can be compared with the one using the
Winsorising Winsorizing or winsorization is the transformation of statistics by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers. It is named after the engineer-turned-biostatistician Charles P. Winsor (1895– ...
procedure.


See also

*
Trimean In statistics the trimean (TM), or Tukey's trimean, is a measure of a probability distribution's location defined as a weighted average of the distribution's median and its two quartiles: : TM= \frac This is equivalent to the average of the med ...
*
Interquartile mean The interquartile mean (IQM) (or midmean) is a statistical measure of central tendency based on the truncated mean of the interquartile range. The IQM is very similar to the scoring method used in sports that are evaluated by a panel of judges: ''d ...
*
Winsorized mean A winsorized mean is a winsorized statistical measure of central tendency, much like the mean and median, and even more similar to the truncated mean. It involves the calculation of the mean after winsorizing -- replacing given parts of a pro ...


References

{{DEFAULTSORT:Truncated Mean Means Robust statistics de:Mittelwert#Winsorisiertes oder gestutztes Mittel