statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the projection matrix

(\mathbf)

, sometimes also called the influence matrix or hat matrix

(\mathbf)

, maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes the influence each response value has on each fitted value. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.

Definition

If the vector of response values is denoted by

\mathbf

and the vector of fitted values by

\mathbf

, :

\mathbf = \mathbf \mathbf.

\mathbf

is usually pronounced "y-hat", the projection matrix

\mathbf

is also named ''hat matrix'' as it "puts a

hat A hat is a Headgear, head covering which is worn for various reasons, including protection against weather conditions, ceremonial reasons such as university graduation, religious reasons, safety, or as a fashion accessory. Hats which incorpor ...

\mathbf

Application for residuals

The formula for the vector of residuals

\mathbf

can also be expressed compactly using the projection matrix: :

\mathbf = \mathbf - \mathbf = \mathbf - \mathbf \mathbf = \left( \mathbf - \mathbf \right) \mathbf.

where

\mathbf

is the

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...

. The matrix

\mathbf := \mathbf - \mathbf

is sometimes referred to as the residual maker matrix or the annihilator matrix. The

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

of the residuals

\mathbf

, by error propagation, equals :

\mathbf_\mathbf = \left( \mathbf - \mathbf \right)^\textsf \mathbf \left( \mathbf-\mathbf \right)

, where

\mathbf

is the

of the error vector (and by extension, the response vector as well). For the case of linear models with

independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

errors in which

\mathbf = \sigma^ \mathbf

, this reduces to: :

\mathbf_\mathbf = \left( \mathbf - \mathbf \right) \sigma^

Intuition

Projection of a vector onto the column space of a matrix

From the figure, it is clear that the closest point from the vector

\mathbf

onto the column space of

\mathbf

, is

\mathbf

, and is one where we can draw a line orthogonal to the column space of

\mathbf

. A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so :

\mathbf^\textsf(\mathbf-\mathbf) = 0

. From there, one rearranges, so :

\begin
              && \mathbf^\textsf\mathbf &- \mathbf^\textsf\mathbf = 0 \\
  \Rightarrow && \mathbf^\textsf\mathbf &= \mathbf^\textsf\mathbf \\
  \Rightarrow && \mathbf &= \left(\mathbf^\textsf\mathbf\right)^\mathbf^\textsf\mathbf
\end

. Therefore, since

\mathbf

is on the column space of

\mathbf

, the projection matrix, which maps

\mathbf

onto

\mathbf

, is

\mathbf\left(\mathbf^\textsf\mathbf\right)^\mathbf^\textsf

Linear model

Suppose that we wish to estimate a linear model using linear least squares. The model can be written as :

\mathbf = \mathbf \boldsymbol\beta + \boldsymbol\varepsilon,

where

\mathbf

is a matrix of

explanatory variable A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...

s (the

design matrix In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set of objects. Each row represents an individual o ...

), ''β'' is a vector of unknown parameters to be estimated, and ''ε'' is the error vector. Many types of models and techniques are subject to this formulation. A few examples are

linear least squares Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and ...

, smoothing splines, regression splines, local regression,

kernel regression In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables ''X'' and ''Y''. In any nonparametric r ...

, and linear filtering.

Ordinary least squares

When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are :

\hat = \left( \mathbf^\textsf \mathbf \right)^ \mathbf^\textsf \mathbf,

so the fitted values are :

\hat = \mathbf \hat = \mathbf \left( \mathbf^\textsf \mathbf \right)^ \mathbf^\textsf \mathbf.

Therefore, the projection matrix (and hat matrix) is given by :

\mathbf := \mathbf \left(\mathbf^\textsf \mathbf \right)^ \mathbf^\textsf.

Weighted and generalized least squares

The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. Suppose that the

of the errors is Σ. Then since :

\hat_= \left( \mathbf^\textsf \mathbf^ \mathbf \right)^ \mathbf^\textsf \mathbf^\mathbf

. the hat matrix is thus :

\mathbf = \mathbf\left( \mathbf^\textsf \mathbf^ \mathbf \right)^ \mathbf^\textsf \mathbf^

and again it may be seen that

H^2 = H\cdot H = H

, though now it is no longer symmetric.

Properties

The projection matrix has a number of useful algebraic properties. In the language of

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as :a_1x_1+\cdots +a_nx_n=b, linear maps such as :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrix (mathemat ...

, the projection matrix is the

orthogonal projection In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself (an endomorphism) such that P\circ P=P. That is, whenever P is applied twice to any vector, it gives the same result as if it we ...

onto the

column space In linear algebra, the column space (also called the range or image) of a matrix ''A'' is the span (set of all possible linear combinations) of its column vectors. The column space of a matrix is the image or range of the corresponding matr ...

of the design matrix

\mathbf

. (Note that

\left( \mathbf^\textsf \mathbf \right)^ \mathbf^\textsf

is the pseudoinverse of X.) Some facts of the projection matrix in this setting are summarized as follows: *

\mathbf = (\mathbf - \mathbf)\mathbf,

and

\mathbf = \mathbf - \mathbf \mathbf \perp \mathbf.

\mathbf

is symmetric, and so is

\mathbf := \mathbf - \mathbf

. *

\mathbf

is idempotent:

\mathbf^2 = \mathbf

, and so is

\mathbf

. * If

\mathbf

is an matrix with

\operatorname(\mathbf) = r

, then

\operatorname(\mathbf) = r

* The

eigenvalue In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...

s of

\mathbf

consist of ''r'' ones and zeros, while the eigenvalues of

\mathbf

consist of ones and ''r'' zeros. *

\mathbf

is invariant under

\mathbf

\mathbf = \mathbf,

hence

\left( \mathbf - \mathbf \right) \mathbf = \mathbf

. *

\left( \mathbf - \mathbf \right) \mathbf = \mathbf \left( \mathbf - \mathbf \right) = \mathbf.

\mathbf

is unique for certain subspaces. The projection matrix corresponding to a

linear model In statistics, the term linear model refers to any model which assumes linearity in the system. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, t ...

symmetric Symmetry () in everyday life refers to a sense of harmonious and beautiful proportion and balance. In mathematics, the term has a more precise definition and is usually used to refer to an object that is invariant under some transformations ...

and

idempotent Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...

, that is,

\mathbf^2 = \mathbf

. However, this is not always the case; in locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neither symmetric nor idempotent. For linear models, the trace of the projection matrix is equal to the rank of

\mathbf

, which is the number of independent parameters of the linear model. For other models such as LOESS that are still linear in the observations

\mathbf

, the projection matrix can be used to define the effective degrees of freedom of the model. Practical applications of the projection matrix in regression analysis include leverage and Cook's distance, which are concerned with identifying influential observations, i.e. observations which have a large effect on the results of a regression.

Blockwise formula

Suppose the design matrix

\mathbf

can be decomposed by columns as

\mathbf = \begin \mathbf & \mathbf \end

. Define the hat or projection operator as

:= \mathbf \left(\mathbf^\textsf \mathbf \right)^ \mathbf^\textsf

. Similarly, define the residual operator as

\mathbf mathbf := \mathbf - \mathbf mathbf /math>.
Then the projection matrix can be decomposed as follows: : \mathbf mathbf = \mathbf mathbf + \mathbf\big mathbf \mathbf\big">mathbf mathbf \mathbf\big where, e.g., \mathbf mathbf = \mathbf \left(\mathbf^\textsf \mathbf \right)^ \mathbf^\textsf and \mathbf mathbf = \mathbf - \mathbf mathbf /math>.
There are a number of applications of such a decomposition. In the classical application \mathbf is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the fixed effects model, where \mathbf is a large

sparse matrix In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse ...

of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of

\mathbf

without explicitly forming the matrix

\mathbf

, which might be too large to fit into computer memory.

History

The hat matrix was introduced by John Wilder in 1972. An article by Hoaglin, D.C. and Welsch, R.E. (1978) gives the properties of the matrix and also many examples of its application.

References

{{Matrix classes Regression analysis Matrices (mathematics)