In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the Hodges–Lehmann estimator is a
robust
Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...
and
nonparametric
Nonparametric statistics is the branch of statistics that is not based solely on Statistical parameter, parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based ...
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
of a population's
location parameter
In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
. For populations that are symmetric about one
median, such as the (Gaussian) normal distribution or the Student ''t''-distribution, the Hodges–Lehmann estimator is a consistent and median-unbiased estimate of the population median. For non-symmetric populations, the Hodges–Lehmann estimator estimates the "
pseudo–median", which is closely related to the population median.
The Hodges–Lehmann estimator was proposed originally for estimating the location parameter of one-dimensional populations, but it has been used for many more purposes. It has been used to estimate the
differences between the members of two populations. It has been generalized from univariate populations to
multivariate populations, which produce samples of
vector
Vector most often refers to:
*Euclidean vector, a quantity with a magnitude and a direction
*Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathematic ...
s.
It is based on the
Wilcoxon signed-rank statistic. In statistical theory, it was an early example of a
rank-based estimator, an important class of estimators both in nonparametric statistics and in robust statistics. The Hodges–Lehmann estimator was proposed in 1963 independently by
Pranab Kumar Sen
Pranab Kumar Sen (born 7 November 1937 in Calcutta, India)[Curriculum vitae](_blank)
, retriev ...
and by
Joseph Hodges and
Erich Lehmann, and so it is also called the "Hodges–Lehmann–Sen estimator".
Definition
In the simplest case, the "Hodges–Lehmann" statistic estimates the location parameter for a univariate population. Its computation can be described quickly. For a dataset with ''n'' measurements, the set of all possible two-element subsets of it has ''n''(''n'' - 1)/2 elements. For each such subset, the mean is computed; finally, the median of these ''n''(''n'' - 1)/2 averages is defined to be the Hodges–Lehmann estimator of location.
The Hodges–Lehmann statistic also estimates the
difference
Difference, The Difference, Differences or Differently may refer to:
Music
* ''Difference'' (album), by Dreamtale, 2005
* ''Differently'' (album), by Cassie Davis, 2009
** "Differently" (song), by Cassie Davis, 2009
* ''The Difference'' (al ...
between two populations. For two sets of data with ''m'' and ''n'' observations, the set of two-element sets made of them is their Cartesian product, which contains ''m'' × ''n'' pairs of points (one from each set); each such pair defines one difference of values. The Hodges–Lehmann statistic is the
median of the ''m'' × ''n'' differences.
[Everitt (2002) Entry for "Hodges-Lehmann estimator"]
Estimating the population median of a symmetric population
For a population that is symmetric, the Hodges–Lehmann statistic estimates the population's median. It is a robust statistic that has a
breakdown point
Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such ...
of 0.29, which means that the statistic remains bounded even if nearly 30 percent of the data have been contaminated. This robustness is an important advantage over the sample mean, which has a zero breakdown point, being proportional to any single observation and so liable to being misled by even one
outlier. The
sample median
Sample or samples may refer to:
Base meaning
* Sample (statistics), a subset of a population – complete data set
* Sample (signal), a digital discrete sample of a continuous analog signal
* Sample (material), a specimen or small quantity of so ...
is even more robust, having a breakdown point of 0.50.
[Myles Hollander. Douglas A. Wolfe. ''Nonparametric statistical methods''. 2nd ed. John Wiley.] The Hodges–Lehmann estimator is much better than the sample mean when estimating mixtures of normal distributions, also.
For symmetric distributions, the Hodges–Lehmann statistic has greater
efficiency than does the sample median. For the normal distribution, the Hodges-Lehmann statistic is nearly as efficient as the sample mean. For the Cauchy distribution (Student t-distribution with one degree of freedom), the Hodges-Lehmann is infinitely more efficient than the sample mean, which is not a consistent estimator of the median.
For non-symmetric populations, the Hodges-Lehmann statistic estimates the population's "pseudo-median", a
location parameter
In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
that is closely related to the
median. The difference between the median and pseudo-median is relatively small, and so this distinction is neglected in elementary discussions. Like the
spatial median
In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
,
the pseudo–median is well defined for all distributions of random variables having dimension two or greater; for one-dimensional distributions, there exists some pseudo–median, which need not be unique, however. Like the median, the pseudo–median is defined for even heavy–tailed distributions that lack any (finite)
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set.
For a data set, the '' ari ...
.
The one-sample Hodges–Lehmann statistic need not estimate any population mean, which for many distributions does not exist. The two-sample Hodges–Lehmann estimator need not estimate the difference of two means or the difference of two (pseudo-)medians; rather, it estimates the differences between the population of the paired random–variables drawn respectively from the populations.
In general statistics
The Hodges–Lehmann ''univariate'' statistics have several generalizations in
''multivariate'' statistics:
*Multivariate ranks and signs
*Spatial sign tests and spatial medians
*Spatial signed-rank tests
*Comparisons of tests and estimates
*Several-sample location problems
See also
*
Median-unbiased estimator
In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic ...
Notes
References
* Everitt, B.S. (2002) ''The Cambridge Dictionary of Statistics'', CUP.
*
*
*
*
*
{{DEFAULTSORT:Hodges-Lehmann Estimator
Robust statistics
Nonparametric statistics