HOME

TheInfoList



OR:

In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a
linear function In mathematics, the term linear function refers to two distinct but related notions: * In calculus and related areas, a linear function is a function whose graph is a straight line, that is, a polynomial function of degree zero or one. For dist ...
of a set of other variables. It is the correlation between the variable's values and the best predictions that can be computed
linearly Linearity is the property of a mathematical relationship (''function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear re ...
from the predictive variables. The coefficient of multiple correlation takes values between 0 and 1. Higher values indicate higher predictability of the
dependent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
from the
independent variables Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or deman ...
, with a value of 1 indicating that the predictions are exactly correct and a value of 0 indicating that no linear combination of the independent variables is a better predictor than is the fixed mean of the dependent variable.Multiple correlation coefficient
/ref> The coefficient of multiple correlation is known as the square root of the
coefficient of determination In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used i ...
, but under the particular assumptions that an intercept is included and that the best possible linear predictors are used, whereas the coefficient of determination is defined for more general cases, including those of nonlinear prediction and those in which the predicted values have not been derived from a model-fitting procedure.


Definition

The coefficient of multiple correlation, denoted ''R'', is a
scalar Scalar may refer to: *Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers *Scalar (physics), a physical quantity that can be described by a single element of a number field such a ...
that is defined as the Pearson correlation coefficient between the predicted and the actual values of the dependent variable in a linear regression model that includes an intercept.


Computation

The square of the coefficient of multiple correlation can be computed using the vector \mathbf = ^\top of correlations r_ between the predictor variables x_n (independent variables) and the target variable y (dependent variable), and the
correlation matrix In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
R_ of correlations between predictor variables. It is given by ::R^2 = \mathbf^\top R_^\, \mathbf, where \mathbf^\top is the transpose of \mathbf, and R_^ is the
inverse Inverse or invert may refer to: Science and mathematics * Inverse (logic), a type of conditional sentence which is an immediate inference made from another conditional sentence * Additive inverse (negation), the inverse of a number that, when a ...
of the matrix ::R_ = \left(\begin r_ & r_ & \dots & r_ \\ r_ & \ddots & & \vdots \\ \vdots & & \ddots & \\ r_ & \dots & & r_ \end\right). If all the predictor variables are uncorrelated, the matrix R_ is the identity matrix and R^2 simply equals \mathbf^\top\, \mathbf, the sum of the squared correlations with the dependent variable. If the predictor variables are correlated among themselves, the inverse of the correlation matrix R_ accounts for this. The squared coefficient of multiple correlation can also be computed as the fraction of variance of the dependent variable that is explained by the independent variables, which in turn is 1 minus the unexplained fraction. The unexplained fraction can be computed as the
sum of squares of residuals In statistics, the residual sum of squares (RSS), also known as the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepanc ...
—that is, the sum of the squares of the prediction errors—divided by the sum of squares of deviations of the values of the dependent variable from its
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
.


Properties

With more than two variables being related to each other, the value of the coefficient of multiple correlation depends on the choice of dependent variable: a regression of y on x and z will in general have a different R than will a regression of z on x and y. For example, suppose that in a particular sample the variable z is
uncorrelated In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...
with both x and y, while x and y are linearly related to each other. Then a regression of z on y and x will yield an R of zero, while a regression of y on x and z will yield a strictly positive R. This follows since the correlation of y with its best predictor based on x and z is in all cases at least as large as the correlation of y with its best predictor based on x alone, and in this case with z providing no explanatory power it will be exactly as large.


References


Further reading

* Allison, Paul D. (1998). ''Multiple Regression: A Primer''. London: Sage Publications. * Cohen, Jacob, et al. (2002). ''Applied Multiple Regression: Correlation Analysis for the Behavioral Sciences''. * Crown, William H. (1998). ''Statistical Models for the Social and Behavioral Sciences: Multiple Regression and Limited-Dependent Variable Models''. * Edwards, Allen Louis (1985). ''Multiple Regression and the Analysis of Variance and Covariance''. * Keith, Timothy (2006). ''Multiple Regression and Beyond''. Boston: Pearson Education. * Fred N. Kerlinger, Elazar J. Pedhazur (1973). ''Multiple Regression in Behavioral Research.'' New York: Holt Rinehart Winston. * Stanton, Jeffrey M. (2001)
"Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors"
''Journal of Statistics Education'', 9 (3). {{DEFAULTSORT:Multiple Correlation Correlation indicators Regression analysis