Nonparametric statistics is the branch of
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
that is not based solely on
parametrized families of
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both
descriptive statistics
A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and an ...
and
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
. Nonparametric tests are often used when the assumptions of parametric tests are violated.
Definitions
The term "nonparametric statistics" has been imprecisely defined in the following two ways, among others:
Applications and purpose
Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have a
ranking
A ranking is a relationship between a set of items such that, for any two items, the first is either "ranked higher than", "ranked lower than" or "ranked equal to" the second.
In mathematics, this is known as a weak order or total preorder of o ...
but no clear
numerical interpretation, such as when assessing
preferences
In psychology, economics and philosophy, preference is a technical term usually used in relation to choosing between alternatives. For example, someone prefers A over B if they would rather choose A than B. Preferences are central to decision theo ...
. In terms of
levels of measurement
Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scal ...
, non-parametric methods result in
ordinal data
Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories are not known. These data exist on an ordinal scale, one of four levels of measurement described b ...
.
As non-parametric methods make fewer assumptions, their applicability is much wider than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are more
robust
Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...
.
Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving less room for improper use and misunderstanding.
The wider applicability and increased
robustness
Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...
of non-parametric tests comes at a cost: in cases where a parametric test would be appropriate, non-parametric tests have less
power
Power most often refers to:
* Power (physics), meaning "rate of doing work"
** Engine power, the power put out by an engine
** Electric power
* Power (social and political), the ability to influence people or events
** Abusive power
Power may a ...
. In other words, a larger sample size can be required to draw conclusions with the same degree of confidence.
Non-parametric models
''Non-parametric models'' differ from
parametric models in that the model structure is not specified ''a priori'' but is instead determined from data. The term ''non-parametric'' is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.
* A
histogram
A histogram is an approximate representation of the distribution of numerical data. The term was first introduced by Karl Pearson. To construct a histogram, the first step is to " bin" (or "bucket") the range of values—that is, divide the ent ...
is a simple nonparametric estimate of a probability distribution.
*
Kernel density estimation
In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on ''kernels'' as w ...
is another method to estimate a probability distribution.
*
Nonparametric regression
Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship ...
and
semiparametric regression
In statistics, semiparametric regression includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to us ...
methods have been developed based on
kernels
Kernel may refer to:
Computing
* Kernel (operating system), the central component of most operating systems
* Kernel (image processing), a matrix used for image convolution
* Compute kernel, in GPGPU programming
* Kernel method, in machine learnin ...
,
splines, and
wavelet
A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases or decreases, and then returns to zero one or more times. Wavelets are termed a "brief oscillation". A taxonomy of wavelets has been established, based on the num ...
s.
*
Data envelopment analysis
Data envelopment analysis (DEA) is a nonparametric method in operations research and economics for the estimation of production frontiers.Charnes et al (1978) DEA has been applied in a large range of fields including international banking, economi ...
provides efficiency coefficients similar to those obtained by
multivariate analysis
Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable.
Multivariate statistics concerns understanding the different aims and background of each of the dif ...
without any distributional assumption.
*
KNNs
KNNS (1510 AM) is an oldies radio station in Larned, Kansas, near Great Bend.
History
The Regional Mexican format began in October 2010. Before this, it was an affiliate of ESPN Radio from the spring of 2008 to 2010; before ESPN, it was an oldi ...
classify the unseen instance based on the K points in the training set which are nearest to it.
* A
support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
(with a Gaussian kernel) is a nonparametric large-margin classifier.
* The
method of moments with polynomial probability distributions.
Methods
Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for statistical hypothesis testing which, unlike
parametric statistics
Parametric statistics is a branch of statistics which assumes that sample data comes from a population that can be adequately modeled by a probability distribution that has a fixed set of Statistical parameter, parameters. Conversely a non-parame ...
, make no assumptions about the
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s of the variables being assessed. The most frequently used tests include
History
Early nonparametric statistics include the
median
In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
(13th century or earlier, use in estimation by
Edward Wright, 1599; see ) and the
sign test
The sign test is a statistical method to test for consistent differences between pairs of observations, such as the weight of subjects before and after treatment. Given pairs of observations (such as weight pre- and post-treatment) for each subject ...
by
John Arbuthnot
John Arbuthnot FRS (''baptised'' 29 April 1667 – 27 February 1735), often known simply as Dr Arbuthnot, was a Scottish physician, satirist and polymath in London. He is best remembered for his contributions to mathematics, his membersh ...
(1710) in analyzing the
human sex ratio
In anthropology and demography, the human sex ratio is the ratio of males to females in a population. Like most sexual species, the sex ratio in humans is close to 1:1. In humans, the natural ratio at birth between males and females is slightl ...
at birth (see ).
See also
*
CDF-based nonparametric confidence interval
*
Parametric statistics
Parametric statistics is a branch of statistics which assumes that sample data comes from a population that can be adequately modeled by a probability distribution that has a fixed set of Statistical parameter, parameters. Conversely a non-parame ...
*
Resampling (statistics)
In statistics, resampling is the creation of new samples based on one observed sample.
Resampling methods are:
# Permutation tests (also re-randomization tests)
# Bootstrapping
# Cross validation
Permutation tests
Permutation tests rely on r ...
*
Semiparametric model In statistics, a semiparametric model is a statistical model that has parametric and nonparametric components.
A statistical model is a parameterized family of distributions: \ indexed by a parameter \theta.
* A parametric model is a model i ...
Notes
General references
* Bagdonavicius, V., Kruopis, J., Nikulin, M.S. (2011). "Non-parametric tests for complete data", ISTE & WILEY: London & Hoboken. .
*
*
Gibbons, Jean Dickinson; Chakraborti, Subhabrata (2003). ''Nonparametric Statistical Inference'', 4th Ed. CRC Press. .
* also .
* Hollander M., Wolfe D.A., Chicken E. (2014). ''Nonparametric Statistical Methods'', John Wiley & Sons.
* Sheskin, David J. (2003) ''Handbook of Parametric and Nonparametric Statistical Procedures''. CRC Press.
*
Wasserman, Larry (2007). ''All of Nonparametric Statistics'', Springer. {{isbn, 0-387-25145-6.
Statistical inference
Robust statistics
Mathematical and quantitative methods (economics)