statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

and

econometrics Econometrics is the application of Statistics, statistical methods to economic data in order to give Empirical evidence, empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," ''The New Palgrave: A Dictionary of ...

, set identification (or partial identification) extends the concept of

identifiability In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an ...

(or "point identification") in

statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...

s to situations where the distribution of observable variables is not informative of the exact value of a

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

, but instead constrains the parameter to lie in a

strict subset In mathematics, set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset of ...

of the parameter space. Statistical models that are set identified arise in a variety of settings in

economics Economics () is the social science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services. Economics focuses on the behaviour and intera ...

, including

game theory Game theory is the study of mathematical models of strategic interactions among rational agents. Myerson, Roger B. (1991). ''Game Theory: Analysis of Conflict,'' Harvard University Press, p.&nbs1 Chapter-preview links, ppvii–xi It has appli ...

and the

Rubin causal model The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" was ...

. Though the use of set identification dates to a 1934 article by

Ragnar Frisch Ragnar Anton Kittil Frisch (3 March 1895 – 31 January 1973) was an influential Norwegian economist known for being one of the major contributors to establishing economics as a quantitative and statistically informed science in the early 20th ce ...

, the methods were significantly developed and promoted by

Charles Manski Charles Frederick Manski (born November 27, 1948 in Boston), is Professor of Economics at Northwestern University, an econometrician in the realm of rational choice theory, and an innovator in the arena of parameter identification.Charles Mansk ...

starting in the 1990s. Manski developed a method of worst-case bounds for accounting for

selection bias Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population int ...

. Unlike methods that make additional statistical assumptions, such as

Heckman correction The Heckman correction is a statistical technique to correct bias from non-randomly selected samples or otherwise incidentally truncated dependent variables, a pervasive issue in quantitative social sciences when using observational data. Conceptu ...

, the worst-case bounds rely only on the data to generate a range of supported parameter values.

Definition

Let

\mathcal=\

be a

where the parameter space

\Theta

is either finite- or infinite-dimensional. Suppose

\theta_0

is the true parameter value. We say that

\theta_0

is set identified if there exists

\theta \in \Theta

such that

P_\theta \neq P_

; that is, that some parameter values in

\Theta

are not observationally equivalent to

\theta_0

. In that case, the identified set is the set of parameter values that are observationally equivalent to

\theta_0

Example: missing data

This example is due to . Suppose there are two

binary random variable Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra. Binary data occurs in many different technical and scientific fields, wher ...

s, and . The econometrician is interested in

\mathrm P(Y = 1)

. There is a

missing data In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data. Miss ...

problem, however: can only be observed if

Z = 1

. By the

law of total probability In probability theory, the law (or formula) of total probability is a fundamental rule relating marginal probabilities to conditional probabilities. It expresses the total probability of an outcome which can be realized via several distinct even ...

, :

\mathrm P(Y = 1) = \mathrm P(Y = 1 \mid Z = 1) \mathrm P(Z = 1) + \mathrm P(Y = 1 \mid Z = 0) \mathrm P(Z = 0).

The only unknown object is

\mathrm P(Y = 1 \mid Z = 0)

, which is constrained to lie between 0 and 1. Therefore, the identified set is :

\Theta_I = \.

Given the missing data constraint, the econometrician can only say that

\mathrm P(Y = 1) \in \Theta_I

. This makes use of all available information.

Statistical inference

Set estimation In statistics, a random vector ''x'' is classically represented by a probability density function. In a set-membership approach or set estimation, ''x'' is represented by a set ''X'' to which ''x'' is assumed to belong. This means that the suppor ...

cannot rely on the usual tools for statistical inference developed for

point estimation In statistics, point estimation involves the use of sample data to calculate a single value (known as a point estimate since it identifies a point in some parameter space) which is to serve as a "best guess" or "best estimate" of an unknown populat ...

. A literature in statistics and econometrics studies methods for

statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...

in the context of set-identified models, focusing on constructing

confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...

s or

confidence region In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an ''n''-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, al ...

s with appropriate properties. For example, a method developed by (and which describes as complicated) constructs confidence regions that cover the identified set with a given probability.

Notes

References

* * *

Definition

Example: missing data

Statistical inference

Notes

References

Further reading