HOME

TheInfoList



OR:

A winsorized mean is a winsorized
statistical Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industri ...
measure of central tendency, much like the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arith ...
and
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic f ...
, and even more similar to the
truncated mean A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, a ...
. It involves the calculation of the mean after winsorizing -- replacing given parts of a
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
or sample at the high and low end with the most extreme remaining values, Dodge, Y (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. (entry for "winsorized estimation") typically doing so for an equal amount of both extremes; often 10 to 25 percent of the ends are replaced. The winsorized mean can equivalently be expressed as a
weighted average The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The ...
of the truncated mean and the quantiles at which it is limited, which corresponds to replacing parts with the corresponding quantiles.


Advantages

The winsorized mean is a useful estimator because by retaining the
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter ar ...
s without taking them too literally, it is less sensitive to observations at the extremes than the straightforward mean, and will still generate a reasonable estimate of central tendency or mean for almost all statistical models. In this regard it is referred to as a
robust estimator Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such ...
.


Drawbacks

The winsorized mean uses more information from the distribution or sample than the
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic f ...
. However, unless the underlying distribution is
symmetric Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definit ...
, the winsorized mean of a sample is unlikely to produce an
unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In stat ...
for either the mean or the median.


Example

* For a sample of 10 numbers (from ''x''(1), the smallest, to ''x''(10) the largest;
order statistic In statistics, the ''k''th order statistic of a statistical sample is equal to its ''k''th-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference. Importa ...
notation) the 10% winsorized mean is :: \frac. \, : The key is in the repetition of ''x''(2) and ''x''(9): the extras substitute for the original values ''x''(1) and ''x''(10) which have been discarded and replaced. :This is equivalent to a weighted average of 0.1 times the 5th percentile (''x''(2)), 0.8 times the 10%
trimmed mean A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, a ...
, and 0.1 times the 95th percentile (''x''(9)).


Notes


References

*{{cite journal, first1=R.R., last1=Wilcox, first2=H.J., last2=Keselman, title=Modern robust data analysis methods: Measures of central tendency, year=2003, journal=Psychological Methods, volume=8, pages=254–274, pmid=14596490, issue=3, doi=10.1037/1082-989X.8.3.254 Means Robust statistics de:Mittelwert#Winsorisiertes und getrimmtes Mittel