In
econometrics
Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8� ...
, the seemingly unrelated regressions (SUR)
or seemingly unrelated regression equations (SURE)
model, proposed by
Arnold Zellner
Arnold Zellner (January 2, 1927 – August 11, 2010) was an American economist and statistician specializing in the fields of Bayesian probability and econometrics. Zellner contributed pioneering work in the field of Bayesian analysis and econome ...
in (1962), is a generalization of a
linear regression model
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
that consists of several regression equations, each having its own dependent variable and potentially different sets of exogenous explanatory variables. Each equation is a valid linear regression on its own and can be estimated separately, which is why the system is called ''seemingly unrelated'',
although some authors suggest that the term ''seemingly related'' would be more appropriate,
since the
error terms are assumed to be correlated across the equations.
The model can be estimated equation-by-equation using standard
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
(OLS). Such estimates are
consistent
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consisten ...
, however generally not as
efficient as the SUR method, which amounts to
feasible generalized least squares
In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordina ...
with a specific form of the variance-covariance matrix. Two important cases when SUR is in fact equivalent to OLS are when the error terms are in fact uncorrelated between the equations (so that they are truly unrelated) and when each equation contains exactly the same set of regressors on the right-hand-side.
The SUR model can be viewed as either the simplification of the
general linear model
The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regr ...
where certain coefficients in matrix
are restricted to be equal to zero, or as the generalization of the
general linear model
The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regr ...
where the regressors on the right-hand-side are allowed to be different in each equation. The SUR model can be further generalized into the
simultaneous equations model
Simultaneous equations models are a type of statistical model in which the dependent variables are functions of other dependent variables, rather than just independent variables. This means some of the explanatory variables are jointly determined ...
, where the right-hand side regressors are allowed to be the endogenous variables as well.
The model
Suppose there are ''m'' regression equations
:
Here ''i'' represents the equation number, is the individual observation, and we are taking the transpose of the
column vector. The number of observations ''R'' is assumed to be large, so that in the analysis we take , whereas the number of equations ''m'' remains fixed.
Each equation ''i'' has a single response variable ''y''
''ir'', and a ''k''
''i''-dimensional vector of regressors ''x''
''ir''. If we stack observations corresponding to the ''i''-th equation into ''R''-dimensional vectors and matrices, then the model can be written in vector form as
:
where ''y''
''i'' and ''ε''
''i'' are ''R''×1 vectors, ''X''
''i'' is a ''R''×''k''
''i'' matrix, and ''β''
''i'' is a ''k''
''i''×1 vector.
Finally, if we stack these ''m'' vector equations on top of each other, the system will take the form
:
The assumption of the model is that error terms ''ε''
''ir'' are independent across observations, but may have cross-equation correlations within observations. Thus, we assume that whenever , whereas . Denoting the ''m×m'' skedasticity matrix of each observation, the covariance matrix of the stacked error terms ''ε'' will be equal to
:
where ''I''
''R'' is the ''R''-dimensional
identity matrix
In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere.
Terminology and notation
The identity matrix is often denoted by I_n, or simply by I if the size is immaterial ...
and ⊗ denotes the matrix
Kronecker product
In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation
Operation or Operations may refer to:
Arts, entertainment and media
* ''Operation'' (game), a battery-operated board game that challenges dexterity
* Oper ...
.
Estimation
The SUR model is usually estimated using the
feasible generalized least squares
In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordina ...
(FGLS) method. This is a two-step method where in the first step we run
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
regression for (). The residuals from this regression are used to estimate the elements of matrix
:
:
In the second step we run
generalized least squares
In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordin ...
regression for () using the variance matrix
:
:
This estimator is
unbiased in small samples assuming the error terms ''ε
ir'' have symmetric distribution; in large samples it is
consistent
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consisten ...
and
asymptotically normal with limiting distribution
:
Other estimation techniques besides FGLS were suggested for SUR model: the maximum likelihood (ML) method under the assumption that the errors are normally distributed; the iterative generalized least squares (IGLS), where the residuals from the second step of FGLS are used to recalculate the matrix
, then estimate
again using GLS, and so on, until convergence is achieved; the iterative ordinary least squares (IOLS) scheme, where estimation is performed on equation-by-equation basis, but every equation includes as additional regressors the residuals from the previously estimated equations in order to account for the cross-equation correlations, the estimation is run iteratively until convergence is achieved. Kmenta and Gilbert (1968) ran a Monte-Carlo study and established that all three methods—IGLS, IOLS and ML—yield numerically equivalent results, they also found that the asymptotic distribution of these estimators is the same as the distribution of the FGLS estimator, whereas in small samples neither of the estimators was more superior than the others. Zellner and Ando (2010) developed a direct Monte Carlo method for the Bayesian analysis of SUR model.
Equivalence to OLS
There are two important cases when the SUR estimates turn out to be equivalent to the equation-by-equation OLS. These cases are:
# When the matrix Σ is known to be diagonal, that is, there are no cross-equation correlations between the error terms. In this case the system becomes not seemingly but truly unrelated.
# When each equation contains exactly the same set of regressors, that is . That the estimates turn out to be numerically identical to OLS estimates follows from
Kruskal's tree theorem
In mathematics, Kruskal's tree theorem states that the set of finite trees over a well-quasi-ordered set of labels is itself well-quasi-ordered under homeomorphic embedding.
History
The theorem was conjectured by Andrew Vázsonyi and proved by ...
,
or can be shown via the direct calculation.
Statistical packages
* In
R, SUR can be estimated using the package “systemfit”.
* In
SAS, SUR can be estimated using the
syslin
procedure.
* In
Stata
Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fiel ...
, SUR can be estimated using the
sureg
and
suest
commands.
* In
Limdep
LIMDEP is an econometric and statistical software package with a variety of estimation tools. In addition to the core econometric tools for analysis of cross sections and time series, LIMDEP supports methods for panel data analysis, frontier and e ...
, SUR can be estimated using the
sure
command
* In
Python, SUR can be estimated using the command
SUR
in the “linearmodels” package.
* In
gretl
gretl is an open-source statistical package, mainly for econometrics. The name is an acronym for ''G''nu ''R''egression, ''E''conometrics and ''T''ime-series ''L''ibrary.
It has both a graphical user interface (GUI) and a command-line interf ...
, SUR can be estimated using the
system
command.
See also
*
General linear model
The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regr ...
*
Simultaneous equations models
References
Further reading
*
*
* {{cite book , last=Greene , first=William H. , title=Econometric Analysis , location=Upper Saddle River , publisher=Pearson Prentice-Hall , year=2012 , edition=Seventh , isbn=978-0-273-75356-8 , pages=332–344
Simultaneous equation methods (econometrics)