Analysis of covariance (ANCOVA) is a

general linear model The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regre ...

which blends

ANOVA Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician ...

and regression. ANCOVA evaluates whether the means of a

dependent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...

(DV) are equal across levels of a categorical

independent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...

(IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as

covariate Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...

s (CV) or nuisance variables. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s). The ANCOVA model assumes a linear relationship between the response (DV) and covariate (CV):

y_ = \mu + \tau_i + \Beta(x_ - \overline) + \epsilon_.

In this equation, the DV,

y_

is the jth observation under the ith categorical group; the CV,

x_

is the ''j''th observation of the covariate under the ''i''th group. Variables in the model that are derived from the observed data are

\mu

(the grand mean) and

\overline

(the global mean for covariate

x

). The variables to be fitted are

\tau_i

(the effect of the ''i''th level of the IV),

B

(the slope of the line) and

\epsilon_

(the associated unobserved error term for the ''j''th observation in the ''i''th group). Under this specification, the categorical treatment effects sum to zero

\left(\sum_i^a \tau_i = 0\right).

The standard assumptions of the linear regression model are also assumed to hold, as discussed below.Montgomery, Douglas C. "Design and analysis of experiments" (8th Ed.). John Wiley & Sons, 2012.

Uses

Increase power

ANCOVA can be used to increase

statistical power In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true. It is commonly denoted by 1-\beta, and represents the chances ...

(the probability a significant difference is found between groups when one exists) by reducing the within-group error

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...

. In order to understand this, it is necessary to understand the test used to evaluate differences between groups, the

F-test An ''F''-test is any statistical test in which the test statistic has an ''F''-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model th ...

. The ''F''-test is computed by dividing the explained variance between groups (e.g., medical recovery differences) by the unexplained variance within the groups. Thus, :

F = \frac

If this value is larger than a critical value, we conclude that there is a significant difference between groups. Unexplained variance includes error variance (e.g., individual differences), as well as the influence of other factors. Therefore, the influence of CVs is grouped in the denominator. When we control for the effect of CVs on the DV, we remove it from the denominator making ''F'' larger, thereby increasing our power to find a significant effect if one exists at all. ANCOVA - Partitioning Variance

Adjusting preexisting differences

Another use of ANCOVA is to adjust for preexisting differences in nonequivalent (intact) groups. This controversial application aims at correcting for initial group differences (prior to group assignment) that exists on DV among several intact groups. In this situation, participants cannot be made equal through random assignment, so CVs are used to adjust scores and make participants more similar than without the CV. However, even with the use of covariates, there are no statistical techniques that can equate unequal groups. Furthermore, the CV may be so intimately related to the IV that removing the variance on the DV associated with the CV would remove considerable variance on the DV, rendering the results meaningless.

Assumptions

There are several key assumptions that underlie the use of ANCOVA and affect interpretation of the results. The standard

linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...

assumptions hold; further we assume that the slope of the covariate is equal across all treatment groups (homogeneity of regression slopes).

Assumption 1: linearity of regression

The regression relationship between the dependent variable and concomitant variables must be linear.

Assumption 2: homogeneity of error variances

The error is a random variable with conditional zero mean and equal variances for different treatment classes and observations.

Assumption 3: independence of error terms

The errors are uncorrelated. That is, the error covariance matrix is diagonal.

Assumption 4: normality of error terms

The residuals (error terms) should be normally distributed

\epsilon_

N(0, \sigma^2)

Assumption 5: homogeneity of regression slopes

The slopes of the different regression lines should be equivalent, i.e., regression lines should be parallel among groups. The fifth issue, concerning the homogeneity of different treatment regression slopes is particularly important in evaluating the appropriateness of ANCOVA model. Also note that we only need the error terms to be normally distributed. In fact both the independent variable and the concomitant variables will not be normally distributed in most cases.

Conducting an ANCOVA

Test
multicollinearity In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation, the coeffic ...

If a CV is highly related to another CV (at a correlation of 0.5 or more), then it will not adjust the DV over and above the other CV. One or the other should be removed since they are statistically redundant.

Test the homogeneity of variance assumption

Tested by

Levene's test In statistics, Levene's test is an inferential statistic used to assess the equality of variances for a variable calculated for two or more groups. Some common statistical procedures assume that variances of the populations from which different sam ...

of equality of error variances. This is most important after adjustments have been made, but if you have it before adjustment you are likely to have it afterwards.

Test the homogeneity of regression slopes assumption

To see if the CV significantly interacts with the IV, run an ANCOVA model including both the IV and the CVxIV interaction term. If the CVxIV interaction is significant, ANCOVA should not be performed. Instead, Green & SalkindGreen, S. B., & Salkind, N. J. (2011). ''Using SPSS for Windows and Macintosh: Analyzing and Understanding Data'' (6th ed.). Upper Saddle River, NJ: Prentice Hall. suggest assessing group differences on the DV at particular levels of the CV. Also consider using a moderated regression analysis, treating the CV and its interaction as another IV. Alternatively, one could use mediation analyses to determine if the CV accounts for the IV's effect on the DV.

Run ANCOVA analysis

If the CV×IV interaction is not significant, rerun the ANCOVA without the CV×IV interaction term. In this analysis, you need to use the adjusted means and adjusted MSerror. The adjusted means (also referred to as least squares means, LS means, estimated marginal means, or EMM) refer to the group means after controlling for the influence of the CV on the DV. Main Effects

Follow-up analyses

If there was a significant

main effect In the design of experiments and analysis of variance, a main effect is the effect of an independent variable on a dependent variable averaged across the levels of any other independent variables. The term is frequently used in the context of facto ...

, it means that there is a significant difference between the levels of one IV, ignoring all other factors.Howell, D. C. (2009) ''Statistical methods for psychology'' (7th ed.). Belmont: Cengage Wadsworth. To find exactly which levels are significantly different from one another, one can use the same follow-up tests as for the ANOVA. If there are two or more IVs, there may be a significant interaction, which means that the effect of one IV on the DV changes depending on the level of another factor. One can investigate the simple main effects using the same methods as in a factorial ANOVA.

Power considerations

While the inclusion of a covariate into an ANOVA generally increases

by accounting for some of the variance in the dependent variable and thus increasing the ratio of variance explained by the independent variables, adding a covariate into ANOVA also reduces the

degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...

. Accordingly, adding a covariate which accounts for very little variance in the dependent variable might actually reduce power.

References

External links

Examples of all ANOVA and ANCOVA models with up to three treatment factors, including randomized block, split plot, repeated measures, and Latin squares, and their analysis in R
(University of Southampton)

What is analysis of covariance used for?

Use of covariates in randomized controlled trials by G.J.P. Van Breukelen and K.R.A. Van Dijk (2007)
{{Least Squares and Regression Analysis, state=collapsed Analysis of variance Covariance and correlation