In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
econometrics
Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
, set identification (or partial identification) extends the concept of
identifiability (or "point identification") in
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
s to environments where the model and the distribution of observable variables are not sufficient to determine a unique value for the model
parameters, but instead constrain the parameters to lie in a
strict subset
In mathematics, a set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset ...
of the parameter space. Statistical models that are set (or partially) identified arise in a variety of settings in
economics
Economics () is a behavioral science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services.
Economics focuses on the behaviour and interac ...
, including
game theory
Game theory is the study of mathematical models of strategic interactions. It has applications in many fields of social science, and is used extensively in economics, logic, systems science and computer science. Initially, game theory addressed ...
and the
Rubin causal model. Unlike approaches that deliver point-identification of the model parameters, methods from the literature on partial identification are used to obtain set estimates that are valid under weaker modelling assumptions.
History
Early works containing the main ideas of set identification included and . However, the methods were significantly developed and promoted by
Charles Manski, beginning with and .
Partial identification continues to be a major theme in research in econometrics. named partial identification as an example of theoretical progress in the econometrics literature, and list partial identification as “one of the most prominent recent themes in econometrics.”
Definition
Let
denote a vector of latent variables, let
denote a vector of observed (possibly endogenous) explanatory variables, and let
denote a vector of observed endogenous outcome variables. A structure is a pair
, where
represents a collection of conditional distributions, and
is a structural function such that
for all realizations
of the random vectors
. A model is a collection of admissible (i.e. possible) structures
.
Let
denote the collection of conditional distributions of
consistent with the structure
. The admissible structures
and
are said to be observationally equivalent if
.
Let
denotes the true (i.e. data-generating) structure. The model is said to be point-identified if for every
we have
. More generally, the model is said to be set (or partially) identified if there exists at least one admissible
such that
. The identified set of structures is the collection of admissible structures that are observationally equivalent to
.
In most cases the definition can be substantially simplified. In particular, when
is independent of
and has a known (up to some finite-dimensional parameter) distribution, and when
is known up to some finite-dimensional vector of parameters, each structure
can be characterized by a finite-dimensional parameter vector
. If
denotes the true (i.e. data-generating) vector of parameters, then the identified set, often denoted as
, is the set of parameter values that are observationally equivalent to
.
Example: missing data
This example is due to . Suppose there are two
binary random variables, and . The econometrician is interested in
. There is a
missing data
In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data.
Mi ...
problem, however: can only be observed if
.
By the
law of total probability
In probability theory, the law (or formula) of total probability is a fundamental rule relating marginal probabilities to conditional probabilities. It expresses the total probability of an outcome which can be realized via several distinct ev ...
,
:
The only unknown object is
, which is constrained to lie between 0 and 1. Therefore, the identified set is
:
Given the missing data constraint, the econometrician can only say that
. This makes use of all available information.
Statistical inference
Set estimation cannot rely on the usual tools for statistical inference developed for
point estimation. A literature in statistics and econometrics studies methods for
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
in the context of set-identified models, focusing on constructing
confidence intervals or
confidence region In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. For a bivariate normal distribution, it is an ellipse, also known as the error ellipse. More generally, it is a set of points in an ''n''-dimension ...
s with appropriate properties. For example, a method developed by constructs confidence regions that cover the identified set with a given probability.
Notes
References
*
*
*
*
*
*
*
*
*
Further reading
*
*
*{{Cite book, publisher = Springer-Verlag, isbn = 978-0-387-00454-9, last = Manski, first = Charles F., author-link = Charles Manski , title = Partial Identification of Probability Distributions, location = New York, date = 2003
Econometric modeling
Estimation theory