Simultaneous equations models are a type of
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
in which the
dependent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
s are functions of other dependent variables, rather than just independent variables. This means some of the explanatory variables are
jointly determined with the dependent variable, which in
economics
Economics () is the social science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services.
Economics focuses on the behaviour and intera ...
usually is the consequence of some underlying
equilibrium mechanism. Take the typical
supply and demand
In microeconomics, supply and demand is an economic model of price determination in a Market (economics), market. It postulates that, Ceteris paribus, holding all else equal, in a perfect competition, competitive market, the unit price for a ...
model: whilst typically one would determine the quantity supplied and demanded to be a function of the price set by the market, it is also possible for the reverse to be true, where producers observe the quantity that consumers demand ''and then'' set the price.
Simultaneity poses challenges for the
estimation
Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...
of the statistical parameters of interest, because the
Gauss–Markov assumption of
strict exogeneity of the regressors is violated. And while it would be natural to estimate all simultaneous equations at once, this often leads to a
computationally costly non-linear optimization problem even for the simplest
system of linear equations
In mathematics, a system of linear equations (or linear system) is a collection of one or more linear equations involving the same variable (math), variables.
For example,
:\begin
3x+2y-z=1\\
2x-2y+4z=-2\\
-x+\fracy-z=0
\end
is a system of three ...
. This situation prompted the development, spearheaded by the
Cowles Commission
The Cowles Foundation for Research in Economics is an economic research institute at Yale University. It was created as the Cowles Commission for Research in Economics at Colorado Springs in 1932 by businessman and economist Alfred Cowles. In 193 ...
in the 1940s and 1950s, of various techniques that estimate each equation in the model seriatim, most notably
limited information maximum likelihood
Simultaneous equations models are a type of statistical model in which the dependent variables are functions of other dependent variables, rather than just independent variables. This means some of the explanatory variables are jointly determined ...
and
two-stage least squares
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to ...
.
Structural and reduced form
Suppose there are ''m'' regression equations of the form
:
where ''i'' is the equation number, and is the observation index. In these equations ''x
it'' is the ''k
i×''1 vector of exogenous variables, ''y
it'' is the dependent variable, ''y
−i,t'' is the ''n
i×''1 vector of all other endogenous variables which enter the ''i''
th equation on the right-hand side, and ''u
it'' are the error terms. The “−''i''” notation indicates that the vector ''y
−i,t'' may contain any of the ''y''’s except for ''y
it'' (since it is already present on the left-hand side). The regression coefficients ''β
i'' and ''γ
i'' are of dimensions ''k
i×''1 and ''n
i×''1 correspondingly. Vertically stacking the ''T'' observations corresponding to the ''i''
th equation, we can write each equation in vector form as
:
where ''y
i'' and ''u
i'' are ''T×''1 vectors, ''X
i'' is a ''T×k
i'' matrix of exogenous regressors, and ''Y
−i'' is a ''T×n
i'' matrix of endogenous regressors on the right-hand side of the ''i''
th equation. Finally, we can move all endogenous variables to the left-hand side and write the ''m'' equations jointly in vector form as
:
This representation is known as the structural form. In this equation is the ''T×m'' matrix of dependent variables. Each of the matrices ''Y
−i'' is in fact an ''n
i''-columned submatrix of this ''Y''. The ''m×m'' matrix Γ, which describes the relation between the dependent variables, has a complicated structure. It has ones on the diagonal, and all other elements of each column ''i'' are either the components of the vector ''−γ
i'' or zeros, depending on which columns of ''Y'' were included in the matrix ''Y
−i''. The ''T×k'' matrix ''X'' contains all exogenous regressors from all equations, but without repetitions (that is, matrix ''X'' should be of full rank). Thus, each ''X
i'' is a ''k
i''-columned submatrix of ''X''. Matrix Β has size ''k×m'', and each of its columns consists of the components of vectors ''β
i'' and zeros, depending on which of the regressors from ''X'' were included or excluded from ''X
i''. Finally, is a ''T×m'' matrix of the error terms.
Postmultiplying the structural equation by , the system can be written in the
reduced form In statistics, and particularly in econometrics, the reduced form of a system of equations is the result of solving the system for the endogenous variables. This gives the latter as functions of the exogenous variables, if any. In econometrics, the ...
as
:
This is already a simple
general linear model
The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regre ...
, and it can be estimated for example by
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
. Unfortunately, the task of decomposing the estimated matrix
into the individual factors Β and is quite complicated, and therefore the reduced form is more suitable for prediction but not inference.
Assumptions
Firstly, the rank of the matrix ''X'' of exogenous regressors must be equal to ''k'', both in finite samples and in the limit as (this later requirement means that in the limit the expression
should converge to a nondegenerate ''k×k'' matrix). Matrix Γ is also assumed to be non-degenerate.
Secondly, error terms are assumed to be serially
independent and identically distributed
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...
. That is, if the ''t''
th row of matrix ''U'' is denoted by ''u''
(''t''), then the sequence of vectors should be iid, with zero mean and some covariance matrix Σ (which is unknown). In particular, this implies that , and .
Lastly, assumptions are required for identification.
Identification
The
identification conditions require that the
system of linear equations
In mathematics, a system of linear equations (or linear system) is a collection of one or more linear equations involving the same variable (math), variables.
For example,
:\begin
3x+2y-z=1\\
2x-2y+4z=-2\\
-x+\fracy-z=0
\end
is a system of three ...
be solvable for the unknown parameters.
More specifically, the ''order condition'', a necessary condition for identification, is that for each equation , which can be phrased as “the number of excluded exogenous variables is greater or equal to the number of included endogenous variables”.
The ''rank condition'', a stronger condition which is necessary and sufficient, is that the
rank
Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as:
Level or position in a hierarchical organization
* Academic rank
* Diplomatic rank
* Hierarchy
* H ...
of equals , where is a matrix which is obtained from by crossing out those columns which correspond to the excluded endogenous variables, and those rows which correspond to the included exogenous variables.
Using cross-equation restrictions to achieve identification
In simultaneous equations models, the most common method to achieve
identification
Identification or identify may refer to:
*Identity document, any document used to verify a person's identity
Arts, entertainment and media
* ''Identify'' (album) by Got7, 2014
* "Identify" (song), by Natalie Imbruglia, 1999
* Identification ( ...
is by imposing within-equation parameter restrictions.
[Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass.] Yet, identification is also possible using cross equation restrictions.
To illustrate how cross equation restrictions can be used for identification, consider the following example from Wooldridge
:
where z's are uncorrelated with u's and y's are
endogenous
Endogenous substances and processes are those that originate from within a living system such as an organism, tissue, or cell.
In contrast, exogenous substances and processes are those that originate from outside of an organism.
For example, es ...
variables. Without further restrictions, the first equation is not identified because there is no excluded exogenous variable. The second equation is just identified if , which is assumed to be true for the rest of discussion.
Now we impose the cross equation restriction of . Since the second equation is identified, we can treat as known for the purpose of identification. Then, the first equation becomes:
:
Then, we can use as
instruments
Instrument may refer to:
Science and technology
* Flight instruments, the devices used to measure the speed, altitude, and pertinent flight angles of various kinds of aircraft
* Laboratory equipment, the measuring tools used in a scientific lab ...
to estimate the coefficients in the above equation since there are one endogenous variable () and one excluded exogenous variable () on the right hand side. Therefore, cross equation restrictions in place of within-equation restrictions can achieve identification.
Estimation
Two-stage least squares (2SLS)
The simplest and the most common estimation method for the simultaneous equations model is the so-called
two-stage least squares
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to ...
method,
developed independently by and . It is an equation-by-equation technique, where the endogenous regressors on the right-hand side of each equation are being instrumented with the regressors ''X'' from all other equations. The method is called “two-stage” because it conducts estimation in two steps:
: ''Step 1'': Regress ''Y
−i'' on ''X'' and obtain the predicted values
;
: ''Step 2'': Estimate ''γ
i'', ''β
i'' by the
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
regression of ''y
i'' on
and ''X
i''.
If the ''i''
th equation in the model is written as
:
where ''Z
i'' is a ''T×''(''n
i + k
i'') matrix of both endogenous and exogenous regressors in the ''i''
th equation, and ''δ
i'' is an (''n
i + k
i'')-dimensional vector of regression coefficients, then the 2SLS estimator of ''δ
i'' will be given by
:
where is the projection matrix onto the linear space spanned by the exogenous regressors ''X''.
Indirect least squares
Indirect least squares is an approach in
econometrics
Econometrics is the application of Statistics, statistical methods to economic data in order to give Empirical evidence, empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," ''The New Palgrave: A Dictionary of ...
where the
coefficient
In mathematics, a coefficient is a multiplicative factor in some term of a polynomial, a series, or an expression; it is usually a number, but may be any expression (including variables such as , and ). When the coefficients are themselves var ...
s in a simultaneous equations model are estimated from the
reduced form In statistics, and particularly in econometrics, the reduced form of a system of equations is the result of solving the system for the endogenous variables. This gives the latter as functions of the exogenous variables, if any. In econometrics, the ...
model using
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
. For this, the structural system of equations is transformed into the reduced form first. Once the coefficients are estimated the model is put back into the structural form.
Limited information maximum likelihood (LIML)
The “limited information” maximum likelihood method was suggested
M. A. Girshick in 1947, and formalized by
T. W. Anderson and
H. Rubin in 1949. It is used when one is interested in estimating a single structural equation at a time (hence its name of limited information), say for observation i:
:
The structural equations for the remaining endogenous variables Y
−i are not specified, and they are given in their reduced form:
:
Notation in this context is different than for the simple
IV case. One has:
*
: The endogenous variable(s).
*
: The exogenous variable(s)
*
: The instrument(s) (often denoted
)
The explicit formula for the LIML is:
:
where , and ''λ'' is the smallest characteristic root of the matrix:
:
where, in a similar way, .
In other words, ''λ'' is the smallest solution of the
generalized eigenvalue problem
In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matri ...
, see :
:
K class estimators
The LIML is a special case of the K-class estimators:
:
with:
*
*
Several estimators belong to this class:
* κ=0:
OLS
* κ=1: 2SLS. Note indeed that in this case,
the usual projection matrix of the 2SLS
* κ=λ: LIML
* κ=λ - α (n-K): estimator. Here K represents the number of instruments, n the sample size, and α a positive constant to specify. A value of α=1 will yield an estimator that is approximately unbiased.
Three-stage least squares (3SLS)
The three-stage least squares estimator was introduced by . It can be seen as a special case of multi-equation
GMM where the set of
instrumental variable
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to ...
s is common to all equations. If all regressors are in fact predetermined, then 3SLS reduces to
seemingly unrelated regressions In econometrics, the seemingly unrelated regressions (SUR) or seemingly unrelated regression equations (SURE) model, proposed by Arnold Zellner in (1962), is a generalization of a linear regression model that consists of several regression equatio ...
(SUR). Thus it may also be seen as a combination of
two-stage least squares
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to ...
(2SLS) with SUR.
Applications in social science
Across fields and disciplines simultaneous equation models are applied to various observational phenomena. These equations are applied when phenomena are assumed to be reciprocally causal. The classic example is supply and demand in
economics
Economics () is the social science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services.
Economics focuses on the behaviour and intera ...
. In other disciplines there are examples such as candidate evaluations and party identification or public opinion and social policy in
political science
Political science is the scientific study of politics. It is a social science dealing with systems of governance and power, and the analysis of political activities, political thought, political behavior, and associated constitutions and la ...
; road investment and travel demand in geography; and educational attainment and parenthood entry in
sociology
Sociology is a social science that focuses on society, human social behavior, patterns of Interpersonal ties, social relationships, social interaction, and aspects of culture associated with everyday life. It uses various methods of Empirical ...
or
demography
Demography () is the statistics, statistical study of populations, especially human beings.
Demographic analysis examines and measures the dimensions and Population dynamics, dynamics of populations; it can cover whole societies or groups ...
. The simultaneous equation model requires a theory of reciprocal causality that includes special features if the causal effects are to be estimated as simultaneous feedback as opposed to one-sided 'blocks' of an equation where a researcher is interested in the causal effect of X on Y while holding the causal effect of Y on X constant, or when the researcher knows the exact amount of time it takes for each causal effect to take place, i.e., the length of the causal lags. Instead of lagged effects, simultaneous feedback means estimating the simultaneous and perpetual impact of X and Y on each other. This requires a theory that causal effects are simultaneous in time, or so complex that they appear to behave simultaneously; a common example are the moods of roommates. To estimate simultaneous feedback models a theory of equilibrium is also necessary – that X and Y are in relatively steady states or are part of a system (society, market, classroom) that is in a relatively stable state.
[2013. “Reverse Arrow Dynamics: Feedback Loops and Formative Measurement.” In ''Structural Equation Modeling: A Second Course'', edited by Gregory R. Hancock and Ralph O. Mueller, 2nd ed., 41–79. Charlotte, NC: Information Age Publishing]
See also
*
General linear model
The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regre ...
*
Seemingly unrelated regressions In econometrics, the seemingly unrelated regressions (SUR) or seemingly unrelated regression equations (SURE) model, proposed by Arnold Zellner in (1962), is a generalization of a linear regression model that consists of several regression equatio ...
*
Reduced form In statistics, and particularly in econometrics, the reduced form of a system of equations is the result of solving the system for the endogenous variables. This gives the latter as functions of the exogenous variables, if any. In econometrics, the ...
*
Parameter identification problem
In economics and econometrics, the parameter identification problem arises when the value of one or more parameters in an economic model cannot be determined from observable variables. It is closely related to non-identifiability in statistics and ...
References
Further reading
*
*
*
*
*
*
*
External links
*{{YouTube, id=D5lt9bhOshc&list=PLD15D38DC7AA3B737&index=15, title=Lecture on the Identification Problem in 2SLS, and Estimation by
Mark Thoma
Mark Allen Thoma (born December 15, 1956) is a macroeconomist and econometrician and a professor of economics at the Department of Economics of the University of Oregon. Thoma is best known as a regular columnist for ''The Fiscal Times'' throug ...
Simultaneous equation methods (econometrics)
Regression models
Mathematical and quantitative methods (economics)