HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and in
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, a linear predictor function is a
linear function In mathematics, the term linear function refers to two distinct but related notions: * In calculus and related areas, a linear function is a function (mathematics), function whose graph of a function, graph is a straight line, that is, a polynomia ...
( linear combination) of a set of coefficients and explanatory variables (
independent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
s), whose value is used to predict the outcome of a
dependent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
. This sort of function usually comes in
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
, where the coefficients are called
regression coefficient In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
s. However, they also occur in various types of
linear classifier In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the val ...
s (e.g.
logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
,
perceptron In machine learning, the perceptron (or McCulloch-Pitts neuron) is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belon ...
s,Rosenblatt, Frank (1957), The Perceptron--a perceiving and recognizing automaton. Report 85-460-1, Cornell Aeronautical Laboratory.
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
s, and
linear discriminant analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
), as well as in various other models, such as
principal component analysis Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
Jolliffe I.T. ''Principal Component Analysis'', Series: Springer Series in Statistics, 2nd ed., Springer, NY, 2002, XXIX, 487 p. 28 illus. and
factor analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
. In many of these models, the coefficients are referred to as "weights".


Definition

The basic form of a linear predictor function f(i) for data point ''i'' (consisting of ''p'' explanatory variables), for ''i'' = 1, ..., ''n'', is : f(i) = \beta_0 + \beta_1 x_ + \cdots + \beta_p x_, where x_, for ''k'' = 1, ..., ''p'', is the value of the ''k''-th explanatory variable for data point ''i'', and \beta_0, \ldots, \beta_p are the ''coefficients'' (regression coefficients, weights, etc.) indicating the relative effect of a particular ''explanatory variable'' on the ''outcome''.


Notations

It is common to write the predictor function in a more compact form as follows: * The coefficients ''β''0, ''β''1, ..., ''β''''p'' are grouped into a single vector ''β'' of size ''p'' + 1. * For each data point ''i'', an additional explanatory pseudo-variable ''x''''i''0 is added, with a fixed value of 1, corresponding to the intercept coefficient ''β''0. * The resulting explanatory variables ''x''''i0''(= 1), ''x''''i''1, ..., ''x''''ip'' are then grouped into a single vector ''xi'' of size ''p'' + 1.


Vector Notation

This makes it possible to write the linear predictor function as follows: :f(i)= \boldsymbol\beta \cdot \mathbf_i using the notation for a
dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an algebra ...
between two vectors.


Matrix Notation

An equivalent form using matrix notation is as follows: :f(i)= \boldsymbol\beta^ \mathbf_i = \mathbf^_i \boldsymbol\beta where \boldsymbol\beta and \mathbf_i are assumed to be a ''(p+1)''-by-1
column vector In linear algebra, a column vector with m elements is an m \times 1 matrix consisting of a single column of m entries, for example, \boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end. Similarly, a row vector is a 1 \times n matrix for some n, c ...
s, \boldsymbol\beta^ is the
matrix transpose In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations). The tr ...
of \boldsymbol\beta (so \boldsymbol\beta^ is a 1-by-''(p+1)''
row vector In linear algebra, a column vector with m elements is an m \times 1 matrix consisting of a single column of m entries, for example, \boldsymbol = \begin x_1 \\ x_2 \\ \vdots \\ x_m \end. Similarly, a row vector is a 1 \times n matrix for some n, c ...
), and \boldsymbol\beta^ \mathbf_i indicates
matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
between the 1-by-''(p+1)'' row vector and the ''(p+1)''-by-1 column vector, producing a 1-by-1 matrix that is taken to be a
scalar Scalar may refer to: *Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers * Scalar (physics), a physical quantity that can be described by a single element of a number field such ...
.


Linear regression

An example of the usage of a linear predictor function is in
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
, where each data point is associated with a
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...
outcome ''y''''i'', and the relationship written :y_i = f(i) + \varepsilon_i = \boldsymbol\beta^\mathbf_i\ + \varepsilon_i, where \varepsilon_i is a ''disturbance term'' or ''
error variable In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is c ...
'' — an ''unobserved''
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
that adds noise to the linear relationship between the dependent variable and predictor function.


Stacking

In some models (standard linear regression, in particular), the equations for each of the data points ''i'' = 1, ..., ''n'' are stacked together and written in vector form as : \mathbf = \mathbf\boldsymbol\beta + \boldsymbol\varepsilon, \, where : \mathbf = \begin y_1 \\ y_2 \\ \vdots \\ y_n \end, \quad \mathbf = \begin \mathbf'_1 \\ \mathbf'_2 \\ \vdots \\ \mathbf'_n \end = \begin x_ & \cdots & x_ \\ x_ & \cdots & x_ \\ \vdots & \ddots & \vdots \\ x_ & \cdots & x_ \end, \quad \boldsymbol\beta = \begin \beta_1 \\ \vdots \\ \beta_p \end, \quad \boldsymbol\varepsilon = \begin \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_n \end. The matrix ''X'' is known as the
design matrix In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set of objects. Each row represents an individual ob ...
and encodes all known information about the
independent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
. The variables \varepsilon_i are
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s, which in standard linear regression are distributed according to a standard normal distribution; they express the influence of any unknown factors on the outcome. This makes it possible to find optimal coefficients through the
method of least squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
using simple matrix operations. In particular, the optimal coefficients \boldsymbol as estimated by least squares can be written as follows: :\boldsymbol =( X^\mathrm T X)^X^\mathbf. The matrix ( X^\mathrm T X)^X^ is known as the Moore–Penrose pseudoinverse of ''X''. The use of the
matrix inverse In linear algebra, an -by- square matrix is called invertible (also nonsingular or nondegenerate), if there exists an -by- square matrix such that :\mathbf = \mathbf = \mathbf_n \ where denotes the -by- identity matrix and the multiplica ...
in this formula requires that ''X'' is of full rank, i.e. there is not perfect
multicollinearity In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation, the coeffic ...
among different explanatory variables (i.e. no explanatory variable can be perfectly predicted from the others). In such cases, the
singular value decomposition In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is related ...
can be used to compute the pseudoinverse.


The explanatory variables

Although the outcomes (dependent variables) to be predicted are assumed to be
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s, the explanatory variables themselves are usually not assumed to be random. Instead, they are assumed to be fixed values, and any random variables (e.g. the outcomes) are assumed to be conditional on them. As a result, the
data analyst Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, enc ...
is free to transform the explanatory variables in arbitrary ways, including creating multiple copies of a given explanatory variable, each transformed using a different function. Other common techniques are to create new explanatory variables in the form of interaction variables by taking products of two (or sometimes more) existing explanatory variables. When a fixed set of nonlinear functions are used to transform the value(s) of a data point, these functions are known as
basis function In mathematics, a basis function is an element of a particular basis for a function space. Every function in the function space can be represented as a linear combination of basis functions, just as every vector in a vector space can be represen ...
s. An example is
polynomial regression In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable ''x'' and the dependent variable ''y'' is modelled as an ''n''th degree polynomial in ''x''. Polynomial regression fi ...
, which uses a linear predictor function to fit an arbitrary degree
polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An exa ...
relationship (up to a given order) between two sets of data points (i.e. a single
real-valued In mathematics, value may refer to several, strongly related notions. In general, a mathematical value may be any definite mathematical object. In elementary mathematics, this is most often a number – for example, a real number such as or an i ...
explanatory variable and a related real-valued dependent variable), by adding multiple explanatory variables corresponding to various powers of the existing explanatory variable. Mathematically, the form looks like this: :y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \cdots + \beta_p x_i^p. In this case, for each data point ''i'', a set of explanatory variables is created as follows: :(x_ = x_i,\quad x_ = x_i^2,\quad \ldots,\quad x_ = x_i^p) and then standard
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
is run. The basis functions in this example would be :\boldsymbol\phi(x) = (\phi_1(x), \phi_2(x), \ldots, \phi_p(x)) = (x, x^2, \ldots, x^p). This example shows that a linear predictor function can actually be much more powerful than it first appears: It only really needs to be linear in the ''coefficients''. All sorts of non-linear functions of the explanatory variables can be fit by the model. There is no particular need for the inputs to basis functions to be univariate or single-dimensional (or their outputs, for that matter, although in such a case, a ''K''-dimensional output value is likely to be treated as ''K'' separate scalar-output basis functions). An example of this is
radial basis function A radial basis function (RBF) is a real-valued function \varphi whose value depends only on the distance between the input and some fixed point, either the origin, so that \varphi(\mathbf) = \hat\varphi(\left\, \mathbf\right\, ), or some other fixed ...
s (RBF's), which compute some transformed version of the distance to some fixed point: :\phi(\mathbf;\mathbf) = \phi(, , \mathbf - \mathbf, , ) = \phi(\sqrt) An example is the
Gaussian Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below. There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
RBF, which has the same functional form as the
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
: :\phi(\mathbf;\mathbf) = e^ which drops off rapidly as the distance from ''c'' increases. A possible usage of RBF's is to create one for every observed data point. This means that the result of an RBF applied to a new data point will be close to 0 unless the new point is near to the point around which the RBF was applied. That is, the application of the radial basis functions will pick out the nearest point, and its regression coefficient will dominate. The result will be a form of nearest neighbor interpolation, where predictions are made by simply using the prediction of the nearest observed data point, possibly interpolating between multiple nearby data points when they are all similar distances away. This type of nearest neighbor method for prediction is often considered diametrically opposed to the type of prediction used in standard linear regression: But in fact, the transformations that can be applied to the explanatory variables in a linear predictor function are so powerful that even the nearest neighbor method can be implemented as a type of linear regression. It is even possible to fit some functions that appear non-linear in the coefficients by transforming the coefficients into new coefficients that do appear linear. For example, a function of the form a + b^2x_ + \sqrtx_ for coefficients a,b,c could be transformed into the appropriate linear function by applying the substitutions b' = b^2, c' = \sqrt, leading to a + b'x_ + c'x_, which is linear. Linear regression and similar techniques could be applied and will often still find the optimal coefficients, but their error estimates and such will be wrong. The explanatory variables may be of any type:
real-valued In mathematics, value may refer to several, strongly related notions. In general, a mathematical value may be any definite mathematical object. In elementary mathematics, this is most often a number – for example, a real number such as or an i ...
,
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two digits (0 and 1) * Binary function, a function that takes two arguments * Binary operation, a mathematical operation that t ...
, categorical, etc. The main distinction is between
continuous variable In mathematics and statistics, a quantitative variable may be continuous or discrete if they are typically obtained by ''measuring'' or ''counting'', respectively. If it can take on two particular real values such that it can also take on all re ...
s (e.g. income, age,
blood pressure Blood pressure (BP) is the pressure of circulating blood against the walls of blood vessels. Most of this pressure results from the heart pumping blood through the circulatory system. When used without qualification, the term "blood pressure" r ...
, etc.) and
discrete variable In mathematics and statistics, a quantitative variable may be continuous or discrete if they are typically obtained by ''measuring'' or ''counting'', respectively. If it can take on two particular real values such that it can also take on all re ...
s (e.g. sex, race, political party, etc.). Discrete variables referring to more than two possible choices are typically coded using dummy variables (or
indicator variable In regression analysis, a dummy variable (also known as indicator variable or just dummy) is one that takes the values 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome. For example, i ...
s), i.e. separate explanatory variables taking the value 0 or 1 are created for each possible value of the discrete variable, with a 1 meaning "variable does have the given value" and a 0 meaning "variable does not have the given value". For example, a four-way discrete variable of
blood type A blood type (also known as a blood group) is a classification of blood, based on the presence and absence of antibodies and inherited antigenic substances on the surface of red blood cells (RBCs). These antigens may be proteins, carbohydrate ...
with the possible values "A, B, AB, O" would be converted to separate two-way dummy variables, "is-A, is-B, is-AB, is-O", where only one of them has the value 1 and all the rest have the value 0. This allows for separate regression coefficients to be matched for each possible value of the discrete variable. Note that, for ''K'' categories, not all ''K'' dummy variables are independent of each other. For example, in the above blood type example, only three of the four dummy variables are independent, in the sense that once the values of three of the variables are known, the fourth is automatically determined. Thus, it's really only necessary to encode three of the four possibilities as dummy variables, and in fact if all four possibilities are encoded, the overall model becomes non-
identifiable In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an ...
. This causes problems for a number of methods, such as the simple closed-form solution used in linear regression. The solution is either to avoid such cases by eliminating one of the dummy variables, and/or introduce a
regularization Regularization may refer to: * Regularization (linguistics) * Regularization (mathematics) * Regularization (physics) In physics, especially quantum field theory, regularization is a method of modifying observables which have singularities in ...
constraint (which necessitates a more powerful, typically iterative, method for finding the optimal coefficients).


See also

*
Linear model In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term ...
*
Linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...


References

{{Reflist Regression analysis Machine learning