HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. It can be considered an intermediate problem between regression and
classification Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
. Examples of ordinal regression are ordered logit and ordered probit. Ordinal regression turns up often in the
social sciences Social science (often rendered in the plural as the social sciences) is one of the branches of science, devoted to the study of society, societies and the Social relation, relationships among members within those societies. The term was former ...
, for example in the modeling of human levels of preference (on a scale from, say, 1–5 for "very poor" through "excellent"), as well as in
information retrieval Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
. In
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, ordinal regression may also be called ranking learning.


Linear models for ordinal regression

Ordinal regression can be performed using a
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
(GLM) that fits both a coefficient vector and a set of ''thresholds'' to a dataset. Suppose one has a set of observations, represented by length- vectors through , with associated responses through , where each is an ordinal variable on a scale . For simplicity, and without loss of generality, we assume is a non-decreasing vector, that is, . To this data, one fits a length- coefficient vector and a set of thresholds with the property that . This set of thresholds divides the real number line into disjoint segments, corresponding to the response levels. The model can now be formulated as :\Pr(y \le i \mid \mathbf) = \sigma(\theta_i - \mathbf \cdot \mathbf) or, the cumulative probability of the response being at most is given by a function (the inverse
link function In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
) applied to a linear function of . Several choices exist for ; the
logistic function A logistic function or logistic curve is a common S-shaped curve ( sigmoid curve) with the equation f(x) = \frac where The logistic function has domain the real numbers, the limit as x \to -\infty is 0, and the limit as x \to +\infty is L. ...
:\sigma(\theta_i - \mathbf \cdot \mathbf) = \frac gives the ordered logit model, while using the CDF of the standard
normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...
gives the ordered probit model. A third option is to use an exponential function :\sigma(\theta_i - \mathbf \cdot \mathbf) = 1 - \exp(-\exp(\theta_i - \mathbf \cdot \mathbf)) which gives the proportional hazards model.


Latent variable model

The probit version of the above model can be justified by assuming the existence of a real-valued
latent variable In statistics, latent variables (from Latin: present participle of ) are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such '' latent va ...
(unobserved quantity) , determined by :y^* = \mathbf \cdot \mathbf + \varepsilon where is
normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...
with zero mean and unit variance, conditioned on . The response variable results from an "incomplete measurement" of , where one only determines the interval into which falls: : y = \begin 1 & \text ~~ y^* \le \theta_1, \\ 2 & \text ~~ \theta_1 < y^* \le \theta_2, \\ 3 & \text ~~ \theta_2 < y^* \le \theta_3 \\ \vdots \\ K & \text~~ \theta_ < y^*. \end Defining and , the above can be summarized as
if and only if In logic and related fields such as mathematics and philosophy, "if and only if" (often shortened as "iff") is paraphrased by the biconditional, a logical connective between statements. The biconditional is true in two cases, where either bo ...
. From these assumptions, one can derive the conditional distribution of as : \begin P(y = k \mid \mathbf) & = P(\theta_ < y^* \le \theta_k \mid \mathbf) \\ & = P(\theta_ < \mathbf \cdot \mathbf + \varepsilon \le \theta_k) \\ & = \Phi(\theta_k - \mathbf \cdot \mathbf) - \Phi(\theta_ - \mathbf \cdot \mathbf) \end where is the
cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...
of the standard normal distribution, and takes on the role of the inverse link function . The log-likelihood of the model for a single training example , can now be stated as :\log\mathcal(\mathbf, \mathbf \mid \mathbf_i, y_i) = \sum_^K _i = k\log Phi(\theta_k - \mathbf \cdot \mathbf_i) - \Phi(\theta_ - \mathbf \cdot \mathbf_i)/math> (using the
Iverson bracket In mathematics, the Iverson bracket, named after Kenneth E. Iverson, is a notation that generalises the Kronecker delta, which is the Iverson bracket of the statement . It maps any statement to a function of the free variables in that statement. ...
.) The log-likelihood of the ordered logit model is analogous, using the logistic function instead of .


Alternative models

In machine learning, alternatives to the latent-variable models of ordinal regression have been proposed. An early result was PRank, a variant of the
perceptron In machine learning, the perceptron is an algorithm for supervised classification, supervised learning of binary classification, binary classifiers. A binary classifier is a function that can decide whether or not an input, represented by a vect ...
algorithm that found multiple parallel hyperplanes separating the various ranks; its output is a weight vector and a sorted vector of thresholds , as in the ordered logit/probit models. The prediction rule for this model is to output the smallest rank such that . Other methods rely on the principle of large-margin learning that also underlies
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborato ...
s. Another approach is given by Rennie and Srebro, who, realizing that "even just evaluating the likelihood of a predictor is not straight-forward" in the ordered logit and ordered probit models, propose fitting ordinal regression models by adapting common
loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
s from classification (such as the hinge loss and log loss) to the ordinal case.


Software

ORCA (Ordinal Regression and Classification Algorithms) is an Octave/MATLAB framework including a wide set of ordinal regression methods. R packages that provide ordinal regression methods include MASS and Ordinal.


See also

*
Logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
* Continuous ranked probability score


Notes


References


Further reading

* * * {{cite book , last=Hardin , first=James , last2=Hilbe , first2=Joseph , authorlink2=Joseph Hilbe , title=Generalized Linear Models and Extensions , publisher=College Station: Stata Press , date=2007 , edition=2nd , isbn=978-1-59718-014-6 Generalized linear models Categorical regression models Classification algorithms