In
statistical decision theory
Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
, an admissible decision rule is a
rule for making a decision such that there is no other rule that is always "better" than it (or at least sometimes better and never worse), in the precise sense of "better" defined below. This concept is analogous to
Pareto efficiency
Pareto efficiency or Pareto optimality is a situation where no action or allocation is available that makes one individual better off without making another worse off. The concept is named after Vilfredo Pareto (1848–1923), Italian civil engi ...
.
Definition
Define
sets ,
and
, where
are the states of nature,
the possible observations, and
the actions that may be taken. An observation
is distributed as
and therefore provides evidence about the state of nature
. A decision rule is a
function
Function or functionality may refer to:
Computing
* Function key, a type of key on computer keyboards
* Function model, a structured representation of processes in a system
* Function object or functor or functionoid, a concept of object-oriente ...
, where upon observing
, we choose to take action
.
Also define a
loss function , which specifies the loss we would incur by taking action
when the true state of nature is
. Usually we will take this action after observing data
, so that the loss will be
. (It is possible though unconventional to recast the following definitions in terms of a
utility function
As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosoph ...
, which is the negative of the loss.)
Define the
risk function as the
expectation
:
Whether a decision rule
\delta\,\! has low risk depends on the true state of nature
\theta\,\!. A decision rule
\delta^*\,\! dominating decision rule">dominates a decision rule
\delta\,\! if and only if
R(\theta,\delta^*)\le R(\theta,\delta) for all
\theta\,\!, ''and'' the inequality is inequality (mathematics)">strict for some
\theta\,\!.
A decision rule is admissible (with respect to the loss function) if and only if no other rule dominates it; otherwise it is inadmissible. Thus an admissible decision rule is a maximal element with respect to the above partial order.
An inadmissible rule is not preferred (except for reasons of simplicity or computational efficiency), since by definition there is some other rule that will achieve equal or lower risk for ''all''
\theta\,\!. But just because a rule
\delta\,\! is admissible does not mean it is a good rule to use. Being admissible means there is no other single rule that is ''always'' as good or better – but other admissible rules might achieve lower risk for most
\theta\,\! that occur in practice. (The Bayes risk discussed below is a way of explicitly considering which
\theta\,\! occur in practice.)
Bayes rules and generalized Bayes rules
Bayes rules
Let
\pi(\theta)\,\! be a probability distribution on the states of nature. From a
Bayesian
Thomas Bayes (/beɪz/; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister.
Bayesian () refers either to a range of concepts and approaches that relate to statistical methods based on Bayes' theorem, or a followe ...
point of view, we would regard it as a ''
prior distribution
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken int ...
''. That is, it is our believed probability distribution on the states of nature, prior to observing data. For a
frequentist
Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pro ...
, it is merely a function on
\Theta\,\! with no such special interpretation. The Bayes risk of the decision rule
\delta\,\! with respect to
\pi(\theta)\,\! is the expectation
:
r(\pi,\delta)=\operatorname_ (\theta,\delta)\,\!
A decision rule
\delta\,\! that minimizes
r(\pi,\delta)\,\! is called a
Bayes rule
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...
with respect to
\pi(\theta)\,\!. There may be more than one such Bayes rule. If the Bayes risk is infinite for all
\delta\,\!, then no Bayes rule is defined.
Generalized Bayes rules
In the Bayesian approach to decision theory, the observed
x\,\! is considered ''fixed''. Whereas the frequentist approach (i.e., risk) averages over possible samples
x \in \mathcal\,\!, the Bayesian would fix the observed sample
x\,\! and average over hypotheses
\theta \in \Theta\,\!. Thus, the Bayesian approach is to consider for our observed
x\,\! the
expected loss Expected loss is the sum of the values of all possible losses, each multiplied by the probability of that loss occurring.
In bank lending (homes, autos, credit cards, commercial lending, etc.) the expected loss on a loan varies over time for a num ...
:
\rho(\pi,\delta \mid x)=\operatorname_ L(\theta,\delta(x)) \,\!
where the expectation is over the ''posterior'' of
\theta\,\! given
x\,\! (obtained from
\pi(\theta)\,\! and
F(x\mid\theta)\,\! using
Bayes' theorem).
Having made explicit the expected loss for each given
x\,\! separately, we can define a decision rule
\delta\,\! by specifying for each
x\,\! an action
\delta(x)\,\! that minimizes the expected loss. This is known as a generalized Bayes rule with respect to
\pi(\theta)\,\!. There may be more than one generalized Bayes rule, since there may be multiple choices of
\delta(x)\,\! that achieve the same expected loss.
At first, this may appear rather different from the Bayes rule approach of the previous section, not a generalization. However, notice that the Bayes risk already averages over
\Theta\,\! in Bayesian fashion, and the Bayes risk may be recovered as the expectation over
\mathcal of the expected loss (where
x\sim\theta\,\! and
\theta\sim\pi\,\!). Roughly speaking,
\delta\,\! minimizes this expectation of expected loss (i.e., is a Bayes rule) if and only if it minimizes the expected loss for each
x \in \mathcal separately (i.e., is a generalized Bayes rule).
Then why is the notion of generalized Bayes rule an improvement? It is indeed equivalent to the notion of Bayes rule when a Bayes rule exists and all
x\,\! have positive probability. However, no Bayes rule exists if the Bayes risk is infinite (for all
\delta\,\!). In this case it is still useful to define a generalized Bayes rule
\delta\,\!, which at least chooses a minimum-expected-loss action
\delta(x)\!\, for those
x\,\! for which a finite-expected-loss action does exist. In addition, a generalized Bayes rule may be desirable because it must choose a minimum-expected-loss action
\delta(x)\,\! for ''every''
x\,\!, whereas a Bayes rule would be allowed to deviate from this policy on a set
X \subseteq \mathcal of measure 0 without affecting the Bayes risk.
More important, it is sometimes convenient to use an improper prior
\pi(\theta)\,\!. In this case, the Bayes risk is not even well-defined, nor is there any well-defined distribution over
x\,\!. However, the posterior
\pi(\theta\mid x)\,\!—and hence the expected loss—may be well-defined for each
x\,\!, so that it is still possible to define a generalized Bayes rule.
Admissibility of (generalized) Bayes rules
According to the complete class theorems, under mild conditions every admissible rule is a (generalized) Bayes rule (with respect to some prior
\pi(\theta)\,\!—possibly an improper one—that favors distributions
\theta\,\! where that rule achieves low risk). Thus, in
frequentist
Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pro ...
decision theory
Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
it is sufficient to consider only (generalized) Bayes rules.
Conversely, while Bayes rules with respect to proper priors are virtually always admissible, generalized Bayes rules corresponding to
improper priors need not yield admissible procedures.
Stein's example
In decision theory and estimation theory, Stein's example (also known as Stein's phenomenon or Stein's paradox) is the observation that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on ...
is one such famous situation.
Examples
The
James–Stein estimator
The James–Stein estimator is a biased estimator of the mean, \boldsymbol\theta, of (possibly) correlated Gaussian distributed random vectors Y = \ with unknown means \.
It arose sequentially in two main published papers, the earlier version ...
is a nonlinear estimator of the mean of Gaussian random vectors which can be shown to dominate, or outperform, the
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
technique with respect to a mean-square error loss function. Thus least squares estimation is not an admissible estimation procedure in this context. Some others of the standard estimates associated with the
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
are also inadmissible: for example, the
sample estimate of the variance when the population mean and variance are unknown.
Notes
References
*
*
*
*{{cite book , author=Robert, Christian P. , title=The Bayesian Choice , publisher=Springer-Verlag , year=1994 , isbn=3-540-94296-3
Bayesian statistics
Optimal decisions