In statistics, a random effects model, also called a variance components model, is a statistical model where the model parameters are random variables. It is a kind of

hierarchical linear model Multilevel models (also known as hierarchical linear models, linear mixed-effect model, mixed models, nested data models, random coefficient, random-effects models, random parameter models, or split-plot designs) are statistical models of parame ...

, which assumes that the data being analysed are drawn from a hierarchy of different populations whose differences relate to that hierarchy. A random effects model is a special case of a

mixed model A mixed model, mixed-effects model or mixed error-component model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. ...

. Contrast this to the biostatistics definitions, as biostatisticians use "fixed" and "random" effects to respectively refer to the population-average and subject-specific effects (and where the latter are generally assumed to be unknown, latent variables).

Qualitative description

Random effect models assist in controlling for unobserved heterogeneity when the heterogeneity is constant over time and not correlated with independent variables. This constant can be removed from longitudinal data through differencing, since taking a first difference will remove any time invariant components of the model. Two common assumptions can be made about the individual specific effect: the random effects assumption and the fixed effects assumption. The random effects assumption is that the individual unobserved heterogeneity is uncorrelated with the independent variables. The fixed effect assumption is that the individual specific effect is correlated with the independent variables. If the random effects assumption holds, the random effects estimator is more efficient than the fixed effects model.

Simple example

Suppose ''m'' large elementary schools are chosen randomly from among thousands in a large country. Suppose also that ''n'' pupils of the same age are chosen randomly at each selected school. Their scores on a standard aptitude test are ascertained. Let ''Y''_''ij'' be the score of the ''j''th pupil at the ''i''th school. A simple way to model this variable is :

Y_ = \mu + U_i + W_ + \epsilon_,\,

where ''μ'' is the average test score for the entire population. In this model ''U_i'' is the school-specific random effect: it measures the difference between the average score at school ''i'' and the average score in the entire country. The term ''W_ij'' is the individual-specific random effect, i.e., it's the deviation of the ''j''-th pupil's score from the average for the ''i''-th school. The model can be augmented by including additional explanatory variables, which would capture differences in scores among different groups. For example: :

Y_ = \mu + \beta_1 \mathrm_ + \beta_2 \mathrm_ + U_i + W_+ \epsilon_,\,

where Sex_''ij'' is the dummy variable for boys/girls and ParentsEduc_''ij'' records, say, the average education level of a child's parents. This is a

, not a purely random effects model, as it introduces fixed-effects terms for Sex and Parents' Education.

Variance components

The variance of ''Y''_''ij'' is the sum of the variances τ² and σ² of ''U''_''i'' and ''W''_''ij'' respectively. Let :

\overline_ = \frac\sum_^n Y_

be the average, not of all scores at the ''i''th school, but of those at the ''i''th school that are included in the

random sample In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians atte ...

. Let :

\overline_ = \frac\sum_^m\sum_^n Y_

be the

grand average The grand mean or pooled mean is the average of the means of several subsamples, as long as the subsamples have the same number of data points. For example, consider several lots, each containing several items. The items from each lot are sampled fo ...

. Let :

SSW = \sum_^m\sum_^n (Y_ - \overline_)^2 \,

SSB = n\sum_^m (\overline_ - \overline_)^2 \,

be respectively the sum of squares due to differences ''within'' groups and the sum of squares due to difference ''between'' groups. Then it can be shown that :

\fracE(SSW) = \sigma^2

and :

\fracE(SSB) = \frac + \tau^2.

These " expected mean squares" can be used as the basis for

estimation Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

of the "variance components" ''σ''² and ''τ''². The ''τ''² parameter is also called the

intraclass correlation coefficient In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly ...

Applications

Random effects models used in practice include the

Bühlmann model In credibility theory, a branch of study in actuarial science, the Bühlmann model is a random effects model (or "variance components model" or hierarchical linear model) used to determine the appropriate premium for a group of insurance contra ...

of insurance contracts and the Fay-Herriot model used for

small area estimation Small area estimation is any of several statistical techniques involving the estimation of parameters for small sub-populations, generally used when the sub-population of interest is included in a larger survey. The term "small area" in this cont ...

References

External links

Fixed and random effects models
{{DEFAULTSORT:Random Effects Model Regression models Analysis of variance