In

statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

, the projection matrix $(\backslash mathbf)$, sometimes also called the influence matrix or hat matrix $(\backslash mathbf)$, maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes the influence
Influence or influencer may refer to:
*Social influence, in social psychology, influence in interpersonal relationships
**Minority influence, when the minority affect the behavior or beliefs of the majority
*Influencer marketing, through individu ...

each response value has on each fitted value. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.
Definition

If the vector of response values is denoted by $\backslash mathbf$ and the vector of fitted values by $\backslash mathbf$, :$\backslash mathbf\; =\; \backslash mathbf\; \backslash mathbf.$ As $\backslash mathbf$ is usually pronounced "y-hat", the projection matrix $\backslash mathbf$ is also named ''hat matrix'' as it "puts ahat
A hat is a head covering which is worn for various reasons, including protection against weather conditions, ceremonial reasons such as university graduation, religious reasons, safety, or as a fashion accessory. Hats which incorporate mech ...

on $\backslash mathbf$".
The element in the ''i''th row and ''j''th column of $\backslash mathbf$ is equal to the covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the les ...

between the ''j''th response value and the ''i''th fitted value, divided by the variance
In probability theory and statistics, variance is the expected value, expectation of the squared Deviation (statistics), deviation of a random variable from its population mean or sample mean. Variance is a measure of statistical dispersion, di ...

of the former:
:$p\_\; =\; \backslash frac$
Application for residuals

The formula for the vector of residuals $\backslash mathbf$ can also be expressed compactly using the projection matrix: :$\backslash mathbf\; =\; \backslash mathbf\; -\; \backslash mathbf\; =\; \backslash mathbf\; -\; \backslash mathbf\; \backslash mathbf\; =\; \backslash left(\; \backslash mathbf\; -\; \backslash mathbf\; \backslash right)\; \backslash mathbf.$ where $\backslash mathbf$ is theidentity matrix
In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere.
Terminology and notation
The identity matrix is often denoted by I_n, or simply by I if the size is immaterial or ...

. The matrix $\backslash mathbf\; \backslash equiv\; \backslash mathbf\; -\; \backslash mathbf$ is sometimes referred to as the residual maker matrix or the annihilator matrix.
The covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square Matrix (mathematics), matrix giving the covariance between ea ...

of the residuals $\backslash mathbf$, by error propagation
In statistics, propagation of uncertainty (or propagation of error) is the effect of Variable (mathematics), variables' uncertainty, uncertainties (or Errors and residuals in statistics, errors, more specifically random errors) on the uncertainty ...

, equals
:$\backslash mathbf\_\backslash mathbf\; =\; \backslash left(\; \backslash mathbf\; -\; \backslash mathbf\; \backslash right)^\backslash textsf\; \backslash mathbf\; \backslash left(\; \backslash mathbf-\backslash mathbf\; \backslash right)$,
where $\backslash mathbf$ is the covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square Matrix (mathematics), matrix giving the covariance between ea ...

of the error vector (and by extension, the response vector as well). For the case of linear models with independent and identically distributed
In probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

errors in which $\backslash mathbf\; =\; \backslash sigma^\; \backslash mathbf$, this reduces to:
:$\backslash mathbf\_\backslash mathbf\; =\; \backslash left(\; \backslash mathbf\; -\; \backslash mathbf\; \backslash right)\; \backslash sigma^$.
Intuition

From the figure, it is clear that the closest point from the vector $\backslash mathbf$ onto the column space of $\backslash mathbf$, is $\backslash mathbf$, and is one where we can draw a line orthogonal to the column space of $\backslash mathbf$. A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so :$\backslash mathbf^\backslash textsf(\backslash mathbf-\backslash mathbf)\; =\; 0$ From there, one rearranges, so :$\backslash begin\; \&\&\; \backslash mathbf^\backslash textsf\backslash mathbf\; \&-\; \backslash mathbf^\backslash textsf\backslash mathbf\; =\; 0\; \backslash \backslash \; \backslash Rightarrow\; \&\&\; \backslash mathbf^\backslash textsf\backslash mathbf\; \&=\; \backslash mathbf^\backslash textsf\backslash mathbf\; \backslash \backslash \; \backslash Rightarrow\; \&\&\; \backslash mathbf\; \&=\; \backslash left(\backslash mathbf^\backslash textsf\backslash mathbf\backslash right)^\backslash mathbf^\backslash textsf\backslash mathbf\; \backslash end$ Therefore, since $\backslash mathbf$ is on the column space of $\backslash mathbf$, the projection matrix, which maps $\backslash mathbf$ onto $\backslash mathbf$ is just $\backslash mathbf$, or $\backslash mathbf\backslash left(\backslash mathbf^\backslash textsf\backslash mathbf\backslash right)^\backslash mathbf^\backslash textsf$Linear model

Suppose that we wish to estimate a linear model using linear least squares. The model can be written as :$\backslash mathbf\; =\; \backslash mathbf\; \backslash boldsymbol\backslash beta\; +\; \backslash boldsymbol\backslash varepsilon,$ where $\backslash mathbf$ is a matrix ofexplanatory variable
Dependent and independent variables are Variable and attribute (research), variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studi ...

s (the design matrix
In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix (mathematics), matrix of values of explanatory variables of a set of objects. Each row repre ...

), ''β'' is a vector of unknown parameters to be estimated, and ''ε'' is the error vector.
Many types of models and techniques are subject to this formulation. A few examples are linear least squares, smoothing splines, regression splines, local regression
Local regression or local polynomial regression, also known as moving regression, is a generalization of the moving average
In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by crea ...

, kernel regression
In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables ''X'' and ''Y''.
In any nonparametric r ...

, and linear filter
Linear filters process time-varying input signals to produce output signals, subject to the constraint of linearity
Linearity is the property of a mathematical relationship (''function (mathematics), function'') that can be graph of a functio ...

ing.
Ordinary least squares

When the weights for each observation are identical and theerrors
An error (from the Latin ''error'', meaning "wandering") is an action which is inaccurate or incorrect. In some usages, an error is synonymous with a mistake. The etymology derives from the Latin term 'errare', meaning 'to stray'.
In statistics ...

are uncorrelated, the estimated parameters are
:$\backslash hat\; =\; \backslash left(\; \backslash mathbf^\backslash textsf\; \backslash mathbf\; \backslash right)^\; \backslash mathbf^\backslash textsf\; \backslash mathbf,$
so the fitted values are
:$\backslash hat\; =\; \backslash mathbf\; \backslash hat\; =\; \backslash mathbf\; \backslash left(\; \backslash mathbf^\backslash textsf\; \backslash mathbf\; \backslash right)^\; \backslash mathbf^\backslash textsf\; \backslash mathbf.$
Therefore, the projection matrix (and hat matrix) is given by
:$\backslash mathbf\; \backslash equiv\; \backslash mathbf\; \backslash left(\backslash mathbf^\backslash textsf\; \backslash mathbf\; \backslash right)^\; \backslash mathbf^\backslash textsf.$
Weighted and generalized least squares

The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. Suppose that thecovariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square Matrix (mathematics), matrix giving the covariance between ea ...

of the errors is Σ. Then since
: $\backslash hat\_=\; \backslash left(\; \backslash mathbf^\backslash textsf\; \backslash mathbf^\; \backslash mathbf\; \backslash right)^\; \backslash mathbf^\backslash textsf\; \backslash mathbf^\backslash mathbf$.
the hat matrix is thus
: $\backslash mathbf\; =\; \backslash mathbf\backslash left(\; \backslash mathbf^\backslash textsf\; \backslash mathbf^\; \backslash mathbf\; \backslash right)^\; \backslash mathbf^\backslash textsf\; \backslash mathbf^$
and again it may be seen that $H^2\; =\; H\backslash cdot\; H\; =\; H$, though now it is no longer symmetric.
Properties

The projection matrix has a number of useful algebraic properties. In the language oflinear algebra
Linear algebra is the branch of mathematics concerning linear equations such as:
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as:
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces and through matrix (mat ...

, the projection matrix is the orthogonal projection
In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself (an endomorphism) such that P\circ P=P. That is, whenever P is applied twice to any vector, it gives the same result as if it wer ...

onto the column space
In linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as:
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as:
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces a ...

of the design matrix $\backslash mathbf$. (Note that $\backslash left(\; \backslash mathbf^\backslash textsf\; \backslash mathbf\; \backslash right)^\; \backslash mathbf^\backslash textsf$ is the pseudoinverse of X.) Some facts of the projection matrix in this setting are summarized as follows:
* $\backslash mathbf\; =\; (\backslash mathbf\; -\; \backslash mathbf)\backslash mathbf,$ and $\backslash mathbf\; =\; \backslash mathbf\; -\; \backslash mathbf\; \backslash mathbf\; \backslash perp\; \backslash mathbf.$
* $\backslash mathbf$ is symmetric, and so is $\backslash mathbf\; \backslash equiv\; \backslash mathbf\; -\; \backslash mathbf$.
* $\backslash mathbf$ is idempotent: $\backslash mathbf^2\; =\; \backslash mathbf$, and so is $\backslash mathbf$.
* If $\backslash mathbf$ is an matrix with $\backslash operatorname(\backslash mathbf)\; =\; r$, then $\backslash operatorname(\backslash mathbf)\; =\; r$
* The eigenvalue
In linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as:
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as:
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces ...

s of $\backslash mathbf$ consist of ''r'' ones and zeros, while the eigenvalues of $\backslash mathbf$ consist of ones and ''r'' zeros.
* $\backslash mathbf$ is invariant under $\backslash mathbf$ : $\backslash mathbf\; =\; \backslash mathbf,$ hence $\backslash left(\; \backslash mathbf\; -\; \backslash mathbf\; \backslash right)\; \backslash mathbf\; =\; \backslash mathbf$.
* $\backslash left(\; \backslash mathbf\; -\; \backslash mathbf\; \backslash right)\; \backslash mathbf\; =\; \backslash mathbf\; \backslash left(\; \backslash mathbf\; -\; \backslash mathbf\; \backslash right)\; =\; \backslash mathbf.$
* $\backslash mathbf$ is unique for certain subspaces.
The projection matrix corresponding to a linear model
In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term ...

is symmetric
Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definiti ...

and idempotent
Idempotence (, ) is the property of certain operation (mathematics), operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence ...

, that is, $\backslash mathbf^2\; =\; \backslash mathbf$. However, this is not always the case; in locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neither symmetric nor idempotent.
For linear models, the trace
Trace may refer to:
Arts and entertainment Music
* Trace (Son Volt album), ''Trace'' (Son Volt album), 1995
* Trace (Died Pretty album), ''Trace'' (Died Pretty album), 1993
* Trace (band), a Dutch progressive rock band
* The Trace (album), ''The ...

of the projection matrix is equal to the rank
Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as:
Level or position in a hierarchical organization
* Academic rank
* Diplomatic rank
* Hierarchy
* H ...

of $\backslash mathbf$, which is the number of independent parameters of the linear model. For other models such as LOESS that are still linear in the observations $\backslash mathbf$, the projection matrix can be used to define the effective degrees of freedom of the model.
Practical applications of the projection matrix in regression analysis include leverage and Cook's distance
In statistics, Cook's distance or Cook's ''D'' is a commonly used estimate of the Influential observation, influence of a data point when performing a least-squares regression analysis. In a practical ordinary least squares analysis, Cook's distanc ...

, which are concerned with identifying influential observations, i.e. observations which have a large effect on the results of a regression.
Blockwise formula

Suppose the design matrix $X$ can be decomposed by columns as $X\; =\; \backslash begin\; A\; \&\; B\; \backslash end$. Define the hat or projection operator as $P\backslash \; =\; X\; \backslash left(X^\backslash textsf\; X\; \backslash right)^\; X^\backslash textsf$. Similarly, define the residual operator as $M\backslash \; =\; I\; -\; P\backslash $. Then the projection matrix can be decomposed as follows: :$P\backslash \; =\; P\backslash \; +\; P\backslash ,$ where, e.g., $P\backslash \; =\; A\; \backslash left(A^\backslash textsf\; A\; \backslash right)^\; A^\backslash textsf$ and $M\backslash \; =\; I\; -\; P\backslash $. There are a number of applications of such a decomposition. In the classical application $A$ is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the fixed effects model, where $A$ is a largesparse matrix
In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix (mathematics), matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix ...

of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of $X$ without explicitly forming the matrix $X$, which might be too large to fit into computer memory.
See also

*Projection (linear algebra)
In linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as:
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as:
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces ...

* Studentized residuals
* Effective degrees of freedom
* Mean and predicted response
References

{{Matrix classes Regression analysis Matrices