In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, identifiability is a property which a
model
A model is an informative representation of an object, person or system. The term originally denoted the Plan_(drawing), plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a mea ...
must satisfy for precise
inference
Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that in ...
to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an infinite number of observations from it. Mathematically, this is equivalent to saying that different values of the parameters must generate different
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s of the observable variables. Usually the model is identifiable only under certain technical restrictions, in which case the set of these requirements is called the identification conditions.
A model that fails to be identifiable is said to be non-identifiable or unidentifiable: two or more
parametrizations are
observationally equivalent. In some cases, even though a model is non-identifiable, it is still possible to learn the true values of a certain subset of the model parameters. In this case we say that the model is partially identifiable. In other cases it may be possible to learn the location of the true parameter up to a certain finite region of the parameter space, in which case the model is
set identifiable.
Aside from strictly theoretical exploration of the model properties, identifiability can be referred to in a wider scope when a model is tested with experimental data sets, using
identifiability analysis
Identifiability analysis is a group of methods found in mathematical statistics that are used to determine how well the parameters of a model are estimated by the quantity and quality of experimental data.Cobelli & DiStefano (1980) Therefore, thes ...
.
Definition
Let
be a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
with parameter space
. We say that
is identifiable if the mapping
is
one-to-one:
:
This definition means that distinct values of ''θ'' should correspond to distinct probability distributions: if ''θ''
1≠''θ''
2, then also ''P''
''θ''1≠''P''
''θ''2. If the distributions are defined in terms of the
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
s (pdfs), then two pdfs should be considered distinct only if they differ on a set of non-zero measure (for example two functions ƒ
1(''x'') = 1
0 ≤ ''x'' < 1 and ƒ
2(''x'') = 1
0 ≤ ''x'' ≤ 1 differ only at a single point ''x'' = 1 — a set of
measure
Measure may refer to:
* Measurement, the assignment of a number to a characteristic of an object or event
Law
* Ballot measure, proposed legislation in the United States
* Church of England Measure, legislation of the Church of England
* Mea ...
zero — and thus cannot be considered as distinct pdfs).
Identifiability of the model in the sense of invertibility of the map
is equivalent to being able to learn the model's true parameter if the model can be observed indefinitely long. Indeed, if ⊆ ''S'' is the sequence of observations from the model, then by the
strong law of large numbers
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...
,
:
for every measurable set ''A'' ⊆ ''S'' (here 1
is the
indicator function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...
). Thus, with an infinite number of observations we will be able to find the true probability distribution ''P''
0 in the model, and since the identifiability condition above requires that the map
be invertible, we will also be able to find the true value of the parameter which generated given distribution ''P''
0.
Examples
Example 1
Let
be the
normal Normal(s) or The Normal(s) may refer to:
Film and television
* ''Normal'' (2003 film), starring Jessica Lange and Tom Wilkinson
* ''Normal'' (2007 film), starring Carrie-Anne Moss, Kevin Zegers, Callum Keith Rennie, and Andrew Airlie
* ''Norma ...
location-scale family:
:
Then
:
This expression is equal to zero for almost all ''x'' only when all its coefficients are equal to zero, which is only possible when , ''σ''
1, = , ''σ''
2, and ''μ''
1 = ''μ''
2. Since in the scale parameter ''σ'' is restricted to be greater than zero, we conclude that the model is identifiable: ƒ
''θ''1 = ƒ
''θ''2 ⇔ ''θ''
1 = ''θ''
2.
Example 2
Let
be the standard
linear regression model
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
:
:
(where ′ denotes matrix
transpose
In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal;
that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations).
The tr ...
). Then the parameter ''β'' is identifiable if and only if the matrix
is invertible. Thus, this is the identification condition in the model.
Example 3
Suppose
is the classical
errors-in-variables
In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured e ...
linear model
In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term ...
:
:
where (''ε'',''η'',''x*'') are jointly normal independent random variables with zero expected value and unknown variances, and only the variables (''x'',''y'') are observed. Then this model is not identifiable,
only the product βσ²
∗ is (where σ²
∗ is the variance of the latent regressor ''x*''). This is also an example of a
set identifiable model: although the exact value of ''β'' cannot be learned, we can guarantee that it must lie somewhere in the interval (''β''
yx, 1÷''β''
xy), where ''β''
yx is the coefficient in
OLS regression of ''y'' on ''x'', and ''β''
xy is the coefficient in OLS regression of ''x'' on ''y''.
If we abandon the normality assumption and require that ''x*'' were not normally distributed, retaining only the independence condition ''ε'' ⊥ ''η'' ⊥ ''x*'', then the model becomes identifiable.
See also
*
Observability
Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.
In control theory, the observability and controllability of a linear system are mathematical duals.
The concept of observa ...
*
System identification
The field of system identification uses statistical methods to build mathematical models of dynamical systems from measured data. System identification also includes the optimal design of experiments for efficiently generating informative data f ...
*
Simultaneous equations model
Simultaneous equations models are a type of statistical model in which the dependent variables are functions of other dependent variables, rather than just independent variables. This means some of the explanatory variables are jointly determined ...
References
Citations
Sources
*
*
*
*
*
Further reading
*
Econometrics
*
*
*{{Cite journal, doi = 10.2307/1913267, issn = 0012-9682, volume = 39, issue = 3, pages = 577–591, last = Rothenberg, first = Thomas J., title = Identification in Parametric Models, journal = Econometrica, date = 1971, jstor = 1913267
Estimation theory