Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the

method of least squares The method of least squares is a mathematical optimization technique that aims to determine the best fit function by minimizing the sum of the squares of the differences between the observed values and the predicted values of the model. The me ...

estimates the conditional ''

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

'' of the response variable across values of the predictor variables, quantile regression estimates the conditional ''

median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...

'' (or other '' quantiles'') of the response variable. here is also a method for predicting the conditional geometric mean of the response variable, Tofallis (2015). "A Better Measure of Relative Prediction Accuracy for Model Selection and Model Estimation", ''Journal of the Operational Research Society'', 66(8):1352-1362

/ref>.] Quantile regression is an extension of linear regression used when the conditions of linear regression are not met. Quantilsregression

Advantages and applications

One advantage of quantile regression relative to ordinary least squares regression is that the quantile regression estimates are more robust against outliers in the response measurements. However, the main attraction of quantile regression goes beyond this and is advantageous when conditional quantile functions are of interest. Different measures of

central tendency In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.Weisberg H.F (1992) ''Central Tendency and Variability'', Sage University Paper Series on Quantitative Applications in ...

and

statistical dispersion In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartil ...

can be used to more comprehensively analyze the relationship between variables. In

ecology Ecology () is the natural science of the relationships among living organisms and their Natural environment, environment. Ecology considers organisms at the individual, population, community (ecology), community, ecosystem, and biosphere lev ...

, quantile regression has been proposed and used as a way to discover more useful predictive relationships between variables in cases where there is no relationship or only a weak relationship between the means of such variables. The need for and success of quantile regression in ecology has been attributed to the

complexity Complexity characterizes the behavior of a system or model whose components interact in multiple ways and follow local rules, leading to non-linearity, randomness, collective dynamics, hierarchy, and emergence. The term is generally used to c ...

of interactions between different factors leading to

data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...

with unequal variation of one variable for different ranges of another variable. Another application of quantile regression is in the areas of growth charts, where percentile curves are commonly used to screen for abnormal growth.

History

The idea of estimating a median regression slope, a major theorem about minimizing sum of the absolute deviances and a geometrical algorithm for constructing median regression was proposed in 1760 by Ruđer Josip Bošković, a Jesuit Catholic priest from Dubrovnik. He was interested in the ellipticity of the earth, building on Isaac Newton's suggestion that its rotation could cause it to bulge at the

equator The equator is the circle of latitude that divides Earth into the Northern Hemisphere, Northern and Southern Hemisphere, Southern Hemispheres of Earth, hemispheres. It is an imaginary line located at 0 degrees latitude, about in circumferen ...

with a corresponding flattening at the poles. He finally produced the first geometric procedure for determining the

of a rotating

planet A planet is a large, Hydrostatic equilibrium, rounded Astronomical object, astronomical body that is generally required to be in orbit around a star, stellar remnant, or brown dwarf, and is not one itself. The Solar System has eight planets b ...

from three

observation Observation in the natural sciences is an act or instance of noticing or perceiving and the acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the percep ...

s of a surface feature. More importantly for quantile regression, he was able to develop the first evidence of the least absolute criterion and preceded the least squares introduced by Legendre in 1805 by fifty years. Other thinkers began building upon Bošković's idea such as

Pierre-Simon Laplace Pierre-Simon, Marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French polymath, a scholar whose work has been instrumental in the fields of physics, astronomy, mathematics, engineering, statistics, and philosophy. He summariz ...

, who developed the so-called "methode de situation." This led to

Francis Edgeworth Francis Ysidro Edgeworth (8 February 1845 – 13 February 1926) was an Anglo-Irish philosopher and political economist who made significant contributions to the methods of statistics during the 1880s. From 1891 onward, he was appointed th ...

's plural median - a geometric approach to median regression - and is recognized as the precursor of the

simplex method In mathematical optimization, Dantzig's simplex algorithm (or simplex method) is a popular algorithm for linear programming. The name of the algorithm is derived from the concept of a simplex and was suggested by T. S. Motzkin. Simplices are n ...

. The works of Bošković, Laplace, and Edgeworth were recognized as a prelude to Roger Koenker's contributions to quantile regression. Median regression computations for larger data sets are quite tedious compared to the least squares method, for which reason it has historically generated a lack of popularity among statisticians, until the widespread adoption of computers in the latter part of the 20th century.

Background: quantiles

Quantile regression expresses the conditional quantiles of a dependent variable as a linear function of the explanatory variables. Crucial to the practicality of quantile regression is that the quantiles can be expressed as the solution of a minimization problem, as we will show in this section before discussing conditional quantiles in the next section.

Quantile of a random variable

Let

Y

be a real-valued random variable with

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

F_(y)=P(Y\leq y)

. The

\tau

th quantile of Y is given by :

q_(\tau)=F_^(\tau)=\inf\left\

where

\tau\in(0,1).

Define the

loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...

\rho_(m)=m(\tau-\mathbb_)

, where

\mathbb

is an

indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...

. A specific quantile can be found by minimizing the expected loss of

Y-u

with respect to

u

:(): :

q_(\tau)=\undersetE(\rho_(Y-u))=\underset\biggl\.

This can be shown by computing the derivative of the expected loss with respect to

u

via an application of the

Leibniz integral rule In calculus, the Leibniz integral rule for differentiation under the integral sign, named after Gottfried Wilhelm Leibniz, states that for an integral of the form \int_^ f(x,t)\,dt, where -\infty < a(x), b(x) < \infty and the integrands ...

, setting it to 0, and letting

q_

be the solution of :

0=(1-\tau)\int_^dF_(y)-\tau\int_^dF_(y).

This equation reduces to :

0=F_(q_)-\tau,

and then to :

F_(q_)=\tau.

If the solution

q_

is not unique, then we have to take the smallest such solution to obtain the

\tau

th quantile of the random variable ''Y''.

Example

Let

Y

be a discrete random variable that takes values

y_i = i

with

i = 1,2,\dots,9

with equal probabilities. The task is to find the median of Y, and hence the value

\tau=0.5

is chosen. Then the expected loss of

Y-u

is :

L(u)=E(\rho_(Y-u))=\frac\sum_

(y_-u)

+\frac\sum_

(y_-u)

=\frac\Bigl(

-

\sum_

(y_-u)

+\sum_

(y_-u)

\Bigr) .

Since

is a constant, it can be taken out of the expected loss function (this is only true if

\tau=0.5

). Then, at ''u''=3, :

L(3) \propto\sum_^

-(i-3)

+\sum_^

(i-3)

= 2+1)+(0+1+2+...+6) =24.

Suppose that ''u'' is increased by 1 unit. Then the expected loss will be changed by

(3)-(6)=-3

on changing ''u'' to 4. If, ''u''=5, the expected loss is :

L(5) \propto \sum_^i+\sum_^i=20,

and any change in ''u'' will increase the expected loss. Thus ''u''=5 is the median. The Table below shows the expected loss (divided by

) for different values of ''u''.

Intuition

Consider

\tau=0.5

and let ''q'' be an initial guess for

q_

. The expected loss evaluated at ''q'' is :

L(q)=-0.5\int_^(y-q)dF_(y)+0.5\int_^(y-q)dF_(y) .

In order to minimize the expected loss, we move the value of ''q'' a little bit to see whether the expected loss will rise or fall. Suppose we increase ''q'' by 1 unit. Then the change of expected loss would be :

\int_^1dF_(y)-\int_^1dF_(y) .

The first term of the equation is

F_(q)

and second term of the equation is

1-F_(q)

. Therefore, the change of expected loss function is negative if and only if

F_(q)<0.5

, that is if and only if ''q'' is smaller than the median. Similarly, if we reduce ''q'' by 1 unit, the change of expected loss function is negative if and only if ''q'' is larger than the median. In order to minimize the expected loss function, we would increase (decrease) ''q'' if ''q'' is smaller (larger) than the median, until ''q'' reaches the median. The idea behind the minimization is to count the number of points (weighted with the density) that are larger or smaller than ''q'' and then move ''q'' to a point where ''q'' is larger than

100\tau

% of the points.

Sample quantile

The

\tau

sample quantile can be obtained by using an

importance sampling Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally at ...

estimate and solving the following minimization problem :

\hat_=\underset\sum_^\rho_(y_-q) ,

=\underset  \left \tau-1)\sum_(q-y_)+\tau\sum_(y_-q) \right /math>, 
where the function \rho_is the tilted absolute value function. The intuition is the same as for the population quantile.

Conditional quantile and quantile regression

The

\tau

th conditional quantile of

Y

given

X

is the

\tau

th quantile of the

Conditional probability distribution In probability theory and statistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed random variables X ...

Y

given

X

, :

Q_(\tau)=\inf\left\

. We use a capital

Q

to denote the conditional quantile to indicate that it is a random variable. In quantile regression for the

\tau

th quantile we make the assumption that the

\tau

th conditional quantile is given as a linear function of the explanatory variables: :

Q_(\tau)=X\beta_

. Given the distribution function of

Y

\beta_

can be obtained by solving :

\beta_=\undersetE(\rho_(Y-X\beta)).

Solving the sample analog gives the estimator of

\beta

. :

\hat=\underset\sum_^(\rho_(Y_-X_\beta)) .

Note that when

\tau = 0.5

, the loss function

\rho_\tau

is proportional to the absolute value function, and thus median regression is the same as linear regression by

least absolute deviations Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical optimality criterion and a statistical optimization technique based on minimizing the su ...

Computation of estimates for regression parameters

The mathematical forms arising from quantile regression are distinct from those arising in the

. The method of least squares leads to a consideration of problems in an

inner product space In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, ofte ...

, involving

projection Projection or projections may refer to: Physics * Projection (physics), the action/process of light, heat, or sound reflecting from a surface to another in a different direction * The display of images by a projector Optics, graphics, and carto ...

onto subspaces, and thus the problem of minimizing the squared errors can be reduced to a problem in

numerical linear algebra Numerical linear algebra, sometimes called applied linear algebra, is the study of how matrix operations can be used to create computer algorithms which efficiently and accurately provide approximate answers to questions in continuous mathemati ...

. Quantile regression does not have this structure, and instead the minimization problem can be reformulated as a

linear programming Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements and objective are represented by linear function#As a polynomia ...

problem :

\underset\left\ ,

where :

u_^=\max(u_,0)

u_^=-\min(u_,0).

Simplex methods or

interior point method Interior-point methods (also referred to as barrier methods or IPMs) are algorithms for solving Linear programming, linear and nonlinear programming, non-linear convex optimization problems. IPMs combine two advantages of previously-known algorit ...

s can be applied to solve the linear programming problem.

Asymptotic properties

For

\tau\in(0,1)

, under some regularity conditions,

\hat_

is asymptotically normal: :

\sqrt(\hat_-\beta_)\oversetN(0,\tau(1-\tau)D^\Omega_D^),

where :

D=E(f_(X\beta)XX^)

and

\Omega_=E(X^ X) .

Direct estimation of the asymptotic variance-covariance matrix is not always satisfactory. Inference for quantile regression parameters can be made with the regression rank-score tests or with the bootstrap methods.

Equivariance

See

invariant estimator In statistics, the concept of being an invariant estimator is a criterion that can be used to compare the properties of different estimators for the same quantity. It is a way of formalising the idea that an estimator should have certain intuitive ...

for background on invariance or see equivariance.

Scale equivariance

For any

a>0

and

\tau\in,1 /math>
: \hat(\tau;aY,X)=a\hat(\tau;Y,X), : \hat(\tau;-aY,X)=-a\hat(1-\tau;Y,X).

Shift equivariance

For any

\gamma\in R^

and

\tau\in,1 /math>
: \hat(\tau;Y+X\gamma,X)=\hat(\tau;Y,X)+\gamma .

Equivariance to reparameterization of design

Let

A

be any

p\times p

nonsingular matrix and

\tau\in,1

\hat(\tau;Y,XA)=A^\hat(\tau;Y,X) .

Invariance to monotone transformations

h

is a nondecreasing function on

\mathbb

, the following invariance property applies: :

h(Q_(\tau))\equiv Q_(\tau).

Example (1): If

W=\exp(Y)

and

Q_(\tau)=X\beta_

, then

Q_(\tau)=\exp(X\beta_)

. The mean regression does not have the same property since

\operatorname (\ln(Y))\neq \ln(\operatorname(Y)).

Inference

Interpretation of the slope parameters

The linear model

Q_(\tau)=X\beta_

mis-specifies the true systematic relation

Q_(\tau)=f(X,\tau)

when

f(\cdot,\tau)

is nonlinear. However,

Q_(\tau)=X\beta_

minimizes a weighted distanced to

f(X,\tau)

among linear models. Furthermore, the slope parameters

\beta_

of the linear model can be interpreted as weighted averages of the derivatives

\nabla f(X,\tau)

so that

\beta_

can be used for causal inference. Specifically, the hypothesis

H_0: \nabla f(x,\tau)=0

for all

x

implies the hypothesis

H_0: \beta_\tau=0

, which can be tested using the estimator

\hat

and its limit distribution.

Goodness of fit

The

goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measur ...

for quantile regression for the

\tau

quantile can be defined as:

R^1(\tau)=1-\frac,

where

\hat_\tau

is the minimized expected loss function under the full model, while

\tilde_\tau

is the expected loss function under the intercept-only model.

Variants

Bayesian methods for quantile regression

Because quantile regression does not normally assume a parametric likelihood for the conditional distributions of Y, X, the Bayesian methods work with a working likelihood. A convenient choice is the asymmetric Laplacian likelihood, because the mode of the resulting posterior under a flat prior is the usual quantile regression estimates. The posterior inference, however, must be interpreted with care. Yang, Wang and He provided a posterior variance adjustment for valid inference. In addition, Yang and He showed that one can have asymptotically valid posterior inference if the working likelihood is chosen to be the empirical likelihood.

Machine learning methods for quantile regression

Beyond

simple linear regression In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x ...

, there are several machine learning methods that can be extended to quantile regression. A switch from the squared error to the tilted absolute value loss function (a.k.a. the ''pinball loss'') allows gradient descent-based learning algorithms to learn a specified quantile instead of the mean. It means that we can apply all

neural network A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...

and

deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...

algorithms to quantile regression, which is then referred to as nonparametric quantile regression. Tree-based learning algorithms are also available for quantile regression (see, e.g., Quantile Regression Forests, as a simple generalization of Random Forests).

Censored quantile regression

If the response variable is subject to censoring, the conditional mean is not identifiable without additional distributional assumptions, but the conditional quantile is often identifiable. For recent work on censored quantile regression, see: Portnoy and Wang and Wang Example (2): Let

Y^=\max(0,Y)

and

Q_=X\beta_

. Then

Q_(\tau)=\max(0,X\beta_)

. This is the censored quantile regression model: estimated values can be obtained without making any distributional assumptions, but at the cost of computational difficulty, some of which can be avoided by using a simple three step censored quantile regression procedure as an approximation. For random censoring on the response variables, the censored quantile regression of Portnoy (2003) provides consistent estimates of all identifiable quantile functions based on reweighting each censored point appropriately. Censored quantile regression has close links to

survival analysis Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory, reliability analysis ...

Heteroscedastic errors

The quantile regression loss needs to be adapted in the presence of heteroscedastic errors in order to be efficient.

Implementations

Numerous statistical software packages include implementations of quantile regression: *

Matlab MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...

function quantreg *

gretl gretl is an open-source statistical package, mainly for econometrics. The name is an acronym for ''G''nu ''R''egression, ''E''conometrics and ''T''ime-series ''L''ibrary. It has both a graphical user interface (GUI) and a command-line interf ...

has the quantreg command. * R offers several packages that implement quantile regression, most notably quantreg by Roger Koenker, but also gbm, quantregForest, qrnn and qgam * Python, via Scikit-garden and statsmodels * SAS through proc quantreg (ver. 9.2) and proc quantselect (ver. 9.3). *

Stata Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose Statistics, statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers ...

, via the qreg command. * Vowpal Wabbit, via --loss_function quantile. *

Mathematica Wolfram (previously known as Mathematica and Wolfram Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network ...

package QuantileRegression.m hosted at the MathematicaForPrediction project at GitHub. *

Wolfram Language The Wolfram Language ( ) is a proprietary, very high-level multi-paradigm programming language developed by Wolfram Research. It emphasizes symbolic computation, functional programming, and rule-based programming and can employ arbitrary stru ...

function QuantileRegression hosted at Wolfram Function Repository.

Literature

* *

References

{{DEFAULTSORT:Quantile Regression Regression analysis