Nonparametric regression is a form of
regression analysis where the predictor does not take a predetermined form but is completely constructed using information derived from the data. That is, no
parametric equation
In mathematics, a parametric equation expresses several quantities, such as the coordinates of a point (mathematics), point, as Function (mathematics), functions of one or several variable (mathematics), variables called parameters.
In the case ...
is assumed for the relationship between
predictors and dependent variable. A larger
sample size is needed to build a nonparametric model having a level of
uncertainty
Uncertainty or incertitude refers to situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown, and is particularly relevant for decision ...
as a
parametric model
In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters.
Defi ...
because the data must supply both the model structure and the parameter estimates.
Definition
Nonparametric regression assumes the following relationship, given the random variables
and
:
:
where
is some deterministic function.
Linear regression
In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
is a restricted case of nonparametric regression where
is assumed to be a linear function of the data.
Sometimes a slightly stronger assumption of additive noise is used:
:
where the random variable
is the `noise term', with mean 0.
Without the assumption that
belongs to a specific parametric family of functions it is impossible to get an unbiased estimate for
, however most estimators are
consistent
In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...
under suitable conditions.
Common nonparametric regression algorithms
This is a non-exhaustive list of non-parametric models for regression.
*
nearest neighbor smoothing (see also
k-nearest neighbors algorithm)
*
regression trees
*
kernel regression
*
local regression
*
multivariate adaptive regression splines
*
smoothing splines
*
neural networks
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...
Examples
Gaussian process regression or Kriging
In Gaussian process regression, also known as Kriging, a Gaussian prior is assumed for the regression curve. The errors are assumed to have a
multivariate normal distribution and the regression curve is estimated by its
posterior mode
An estimation procedure that is often claimed to be part of Bayesian statistics is the maximum a posteriori (MAP) estimate of an unknown quantity, that equals the mode of the posterior density with respect to some reference measure, typically ...
. The Gaussian prior may depend on unknown hyperparameters, which are usually estimated via
empirical Bayes.
The hyperparameters typically specify a prior covariance kernel. In case the kernel should also be inferred nonparametrically from the data, the
critical filter can be used.
Smoothing splines have an interpretation as the posterior mode of a Gaussian process regression.
Kernel regression

Kernel regression estimates the continuous dependent variable from a limited set of data points by
convolving the data points' locations with a
kernel function—approximately speaking, the kernel function specifies how to "blur" the influence of the data points so that their values can be used to predict the value for nearby locations.
Regression trees
Decision tree learning algorithms can be applied to learn to predict a dependent variable from data.
Although the original Classification And Regression Tree (CART) formulation applied only to predicting univariate data, the framework can be used to predict multivariate data, including time series.
See also
*
Lasso (statistics)
*
Local regression
*
Non-parametric statistics
*
Semiparametric regression
*
Isotonic regression
*
Multivariate adaptive regression splines
References
Further reading
*
*
*
*
*
External links
HyperNiche, software for nonparametric multiplicative regressionScale-adaptive nonparametric regression(with Matlab software).
{{statistics