Panel (data) analysis is a statistical method, widely used in

social science Social science is one of the branches of science, devoted to the study of societies and the relationships among individuals within those societies. The term was formerly used to refer to the field of sociology, the original "science of soc ...

epidemiology Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population. It is a cornerstone of public health, and shapes policy decisions and evidenc ...

, and

econometrics Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...

to analyze two-dimensional (typically cross sectional and longitudinal)

panel data In statistics and econometrics, panel data and longitudinal data are both multi-dimensional data involving measurements over time. Panel data is a subset of longitudinal data where observations are for the same subjects each time. Time series and ...

. The data are usually collected over time and over the same individuals and then a regression is run over these two dimensions.

Multidimensional analysis In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a sin ...

is an

econometric Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...

method in which data are collected over more than two dimensions (typically, time, individuals, and some third dimension). A common

regression model looks like

y_=a+bx_+\varepsilon_

, where

y

is the

dependent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...

x

is the independent variable,

a

and

b

are coefficients,

i

and

t

are indices for individuals and time. The error

\varepsilon_

is very important in this analysis. Assumptions about the error term determine whether we speak of fixed effects or random effects. In a fixed effects model,

\varepsilon_

is assumed to vary non-stochastically over

i

t

making the fixed effects model analogous to a dummy variable model in one dimension. In a random effects model,

\varepsilon_

is assumed to vary stochastically over

i

t

requiring special treatment of the error variance matrix. Panel data analysis has three more-or-less independent approaches: *independently pooled panels; *

random effects model In statistics, a random effects model, also called a variance components model, is a statistical model where the model parameters are random variables. It is a kind of hierarchical linear model, which assumes that the data being analysed are dra ...

s; * fixed effects models or first differenced models. The selection between these methods depends upon the objective of the analysis, and the problems concerning the exogeneity of the explanatory variables.

Independently pooled panels

''Key assumption:''
There are no unique attributes of individuals within the measurement set, and no universal effects across time.

Fixed effect models

''Key assumption:''
There are unique attributes of individuals that do not vary over time. That is, the unique attributes for a given individual

i

are time

t

invariant. These attributes may or may not be correlated with the individual dependent variables y_i. To test whether fixed effects, rather than random effects, is needed, the

Durbin–Wu–Hausman test The Durbin–Wu–Hausman test (also called Hausman specification test) is a statistical hypothesis test in econometrics named after James Durbin, De-Min Wu, and Jerry A. Hausman. The test evaluates the consistency of an estimator when compared t ...

can be used.

Random effect models

''Key assumption:''
There are unique, time constant attributes of individuals that are not correlated with the individual regressors. Pooled OLS can be used to derive unbiased and consistent estimates of parameters even when time constant attributes are present, but random effects will be more efficient. Fixed effects is a feasible generalised least squares technique which is asymptotically more efficient than Pooled OLS when time constant attributes are present. Random effects adjusts for the serial correlation which is induced by unobserved time constant attributes.

Models with instrumental variables

In the standard random effects (RE) and fixed effects (FE) models, independent variables are assumed to be uncorrelated with error terms. Provided the availability of valid instruments, RE and FE methods extend to the case where some of the explanatory variables are allowed to be endogenous. As in the exogenous setting, RE model with Instrumental Variables (REIV) requires more stringent assumptions than FE model with Instrumental Variables (FEIV) but it tends to be more efficient under appropriate conditions.Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass. To fix ideas, consider the following model: :

y_=x_\beta+c_i+u_

where

c_i

is unobserved unit-specific time-invariant effect (call it unobserved effect) and

x_

can be correlated with

u_

for ''s'' possibly different from ''t''. Suppose there exists a set of valid instruments

z_i=(z_,\ldots,z_)

. In REIV setting, key assumptions include that

z_i

is uncorrelated with

c_i

as well as

u_

for

t=1,\ldots,T

. In fact, for REIV estimator to be efficient, conditions stronger than uncorrelatedness between instruments and unobserved effect are necessary. On the other hand, FEIV estimator only requires that instruments be exogenous with error terms after conditioning on unobserved effect i.e.

E_ \mid z_i,c_i 0 /math>. The FEIV condition allows for arbitrary correlation between instruments and unobserved effect.  However, this generality does not come for free: time-invariant explanatory and instrumental variables are not allowed. As in the usual FE method, the estimator uses time-demeaned variables to remove unobserved effect. Therefore, FEIV estimator would be of limited use if variables of interest include time-invariant ones.

The above discussion has parallel to the exogenous case of RE and FE models.  In the exogenous case, RE assumes uncorrelatedness between explanatory variables and unobserved effect, and FE allows for arbitrary correlation between the two. Similar to the standard case, REIV tends to be more efficient than FEIV provided that appropriate assumptions hold.

Dynamic panel models

In contrast to the standard panel data model, a dynamic panel model also includes lagged values of the dependent variable as regressors. For example, including one lag of the dependent variable generates: :

y_=a+bx_+\rho y_+\varepsilon_

The assumptions of the fixed effect and random effect models are violated in this setting. Instead, practitioners use a technique like the Arellano–Bond estimator.

References