Randomised Decision Rule
   HOME

TheInfoList



OR:

In statistical
decision theory Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
, a randomised decision rule or mixed decision rule is a
decision rule In decision theory, a decision rule is a function which maps an observation to an appropriate action. Decision rules play an important role in the theory of statistics and economics, and are closely related to the concept of a strategy (game theory ...
that associates probabilities with deterministic decision rules. In finite decision problems, randomised decision rules define a ''risk set'' which is the
convex hull In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. The convex hull may be defined either as the intersection of all convex sets containing a given subset of a Euclidean space ...
of the risk points of the nonrandomised decision rules. As nonrandomised alternatives always exist to randomised Bayes rules, randomisation is not needed in
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
, although
frequentist Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pr ...
statistical theory sometimes requires the use of randomised rules to satisfy optimality conditions such as
minimax Minimax (sometimes MinMax, MM or saddle point) is a decision rule used in artificial intelligence, decision theory, game theory, statistics, and philosophy for ''mini''mizing the possible loss for a worst case (''max''imum loss) scenario. When de ...
, most notably when deriving
confidence intervals In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
and hypothesis tests about
discrete probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s. A statistical test making use of a randomized decision rule is called a randomized test.


Definition and interpretation

Let \mathcal D =\ be a set of non-randomised decision rules with associated probabilities p_1, p_2, ..., p_h. Then the randomised decision rule d^* is defined as \sum_^h p_i d_i and its associated
risk function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
R(\theta, d^*) is \sum_^h p_i R(\theta, d_i).Young and Smith, p. 11 This rule can be treated as a random
experiment An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into Causality, cause-and-effect by demonstrating what outcome oc ...
in which the decision rules d_1, ... ,d_h \in \mathcal D are selected with probabilities p_1, ... p_h respectively. Alternatively, a randomised decision rule may assign probabilities directly on elements of the actions space \mathcal A for each member of the sample space. More formally, d^*(x, A) denotes the probability that an action a \in \mathcal A is chosen. Under this approach, its
loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
is also defined directly as: \int_d^*(x, A) L(\theta, A) dA.Parmigiani, p. 132 The introduction of randomised decision rules thus creates a larger decision space from which the statistician may choose his decision. As non-randomised decision rules are a special case of randomised decision rules where one decision or action has probability 1, the original decision space \mathcal D is a proper subset of the new decision space \mathcal D^*.DeGroot, p.128-129


Selection of randomised decision rules

As with nonrandomised decision rules, randomised decision rules may satisfy favourable properties such as admissibility, minimaxity and Bayes. This shall be illustrated in the case of a finite decision problem, i.e. a problem where the parameter space is a finite set of, say, k elements. The risk set, henceforth denoted as \mathcal S, is the set of all vectors in which each entry is the value of the
risk function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
associated with a randomised decision rule under a certain parameter: it contains all vectors of the form (R(\theta_1, d^*), ... R(\theta_k, d^*)), d^* \in \mathcal D^*. Note that by the definition of the randomised decision rule, the risk set is the
convex hull In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. The convex hull may be defined either as the intersection of all convex sets containing a given subset of a Euclidean space ...
of the risks (R(\theta_1, d), ... R(\theta_k, d)), d \in \mathcal D.Bickel and Doksum, p.29 In the case where the parameter space has only two elements \theta_1 and \theta_2, this constitutes a subset of \mathbb R^2, so it may be drawn with respect to the coordinate axes R_1 and R_2 corresponding to the risks under \theta_1 and \theta_2 respectively. An example is shown on the right.


Admissibility

An
admissible decision rule In statistical decision theory, an admissible decision rule is a rule for making a decision such that there is no other rule that is always "better" than it (or at least sometimes better and never worse), in the precise sense of "better" defined ...
is one that is not dominated by any other decision rule, i.e. there is no decision rule that has equal risk as or lower risk than it for all parameters and strictly lower risk than it for some parameter. In a finite decision problem, the risk point of an admissible decision rule has either lower x-coordinates or y-coordinates than all other risk points or, more formally, it is the set of rules with risk points of the form (a, b) such that \ \cap \mathcal S = (a, b). Thus the left side of the lower boundary of the risk set is the set of admissible decision rules.Young and Smith, p.12


Minimax

A minimax Bayes rule is one that minimises the supremum risk \sup_ R(\theta, d^*) among all decision rules in \mathcal D^*. Sometimes, a randomised decision rule may perform better than all other nonrandomised decision rules in this regard. In a finite decision problem with two possible parameters, the minimax rule can be found by considering the family of squares Q(c) = \.Bickel and Doksum, p.30 The value of c for the smallest of such squares that touches \mathcal S is the minimax risk and the corresponding point or points on the risk set is the minimax rule. If the risk set intersects the line R_1 = R_2, then the admissible decision rule lying on the line is minimax. If R_2 > R_1 or R_1 > R_2 holds for every point in the risk set, then the minimax rule can either be an extreme point (i.e. a nonrandomised decision rule) or a line connecting two extreme points (nonrandomised decision rules). File:Riskset minimax smaller.svg, The minimax rule is the randomised decision rule (1-p)d_1 + pd_2. File:Riskset minimax2.svg, The minimax rule is d_2. File:Riskset minimax3.svg, The minimax rules are all rules of the form (1-p)d_1 + pd_2, 0 \leq p \leq 1.


Bayes

A randomised Bayes rule is one that has infimum Bayes risk r(\pi, d^*) among all decision rules. In the special case where the parameter space has two elements, the line \pi_1 R_1 +(1 - \pi_1) R_2 = c, where \pi_1 and \pi_2 denote the prior probabilities of \theta_1 and \theta_2 respectively, is a family of points with Bayes risk c. The minimum Bayes risk for the decision problem is therefore the smallest c such that the line touches the risk set. This line may either touch only one extreme point of the risk set, i.e. correspond to a nonrandomised decision rule, or overlap with an entire side of the risk set, i.e. correspond to two nonrandomised decision rules and randomised decision rules combining the two. This is illustrated by the three situations below: File:Riskset bayes smaller.svg, The Bayes rules are the set of decision rules of the form (1-p)d_1 + pd_2, 0 \leq p \leq 1. File:Riskset bayes2 smaller.svg, The Bayes rule is d_1. File:Riskset bayes3 smaller.svg, The Bayes rule is d_2. As different priors result in different slopes, the set of all rules that are Bayes with respect to some prior are the same as the set of admissible rules. Note that no situation is possible where a nonrandomised Bayes rule does not exist but a randomised Bayes rule does. The existence of a randomised Bayes rule implies the existence of a nonrandomised Bayes rule. This is also true in the general case, even with infinite parameter space, infinite Bayes risk, and regardless of whether the infimum Bayes risk can be attained.Bickel and Doksum, p.31 This supports the intuitive notion that the statistician need not utilise randomisation to arrive at statistical decisions.


In practice

As randomised Bayes rules always have nonrandomised alternatives, they are unnecessary in
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
. However, in frequentist statistics, randomised rules are theoretically necessary under certain situations, and were thought to be useful in practice when they were first invented:
Egon Pearson Egon Sharpe Pearson (11 August 1895 – 12 June 1980) was one of three children of Karl Pearson and Maria, née Sharpe, and, like his father, a leading British statistician. Career He was educated at Winchester College and Trinity College, ...
forecast that they 'will not meet with strong objection'. However, few statisticians actually implement them nowadays.Agresti and Gottard, p.367


Randomised test

Randomized tests should not be confused with
permutation test A permutation test (also called re-randomization test) is an exact statistical hypothesis test making use of the proof by contradiction. A permutation test involves two or more samples. The null hypothesis is that all samples come from the same dis ...
s. In the usual formulation of the
likelihood ratio test In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after im ...
, the
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...
is rejected whenever the likelihood ratio \Lambda is smaller than some constant K, and accepted otherwise. However, this is sometimes problematic when \Lambda is
discrete Discrete may refer to: *Discrete particle or quantum in physics, for example in quantum theory * Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit *Discrete group, a ...
under the null hypothesis, when \Lambda = K is possible. A solution is to define a ''test function'' \phi(x), whose value is the probability at which the null hypothesis is accepted: \phi(x) = \left\{ \begin{array}{l} 1 & \text{ if } \Lambda > K \\ p(x) & \text{ if } \Lambda = K \\ 0 & \text{ if } \Lambda < K \end{array} \right. This can be interpreted as flipping a biased coin with a probability p(x) of returning heads whenever \Lambda = k and rejecting the null hypothesis if a heads turns up.Bickel and Doksum, p.224 A generalised form of the
Neyman–Pearson lemma In statistics, the Neyman–Pearson lemma was introduced by Jerzy Neyman and Egon Pearson in a paper in 1933. The Neyman-Pearson lemma is part of the Neyman-Pearson theory of statistical testing, which introduced concepts like errors of the second ...
states that this test has maximum power among all tests at the same significance level \alpha, that such a test must exist for any significance level \alpha, and that the test is unique under normal situations.Young and Smith, p.68 As an example, consider the case where the underlying distribution is
Bernoulli Bernoulli can refer to: People *Bernoulli family of 17th and 18th century Swiss mathematicians: ** Daniel Bernoulli (1700–1782), developer of Bernoulli's principle **Jacob Bernoulli (1654–1705), also known as Jacques, after whom Bernoulli numbe ...
with probability p, and we would like to test the null hypothesis p \leq \lambda against the alternative hypothesis p > \lambda. It is natural to choose some k such that P(\hat{p} > k, H_0) = \alpha, and reject the null whenever \hat{p} > k, where \hat{p} is the test statistic. However, to take into account cases where \hat p = k , we define the test function: \phi(x) = \left\{ \begin{array}{l} 1 & \text{ if } \hat{p} > k\\ \gamma & \text{ if } \hat{p} = k \\ 0 & \text{ if } \hat{p} < k \end{array} \right. where \gamma is chosen such that P(\hat{p} > k, H_0) + \gamma P(\hat{p} = k, H_0) = \alpha.


Randomised confidence intervals

An analogous problem arises in the construction of confidence intervals. For instance, the
Clopper-Pearson interval In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trial, Bernoulli trials). In other words, a binomia ...
is always conservative because of the discrete nature of the binomial distribution. An alternative is to find the upper and lower confidence limits U and L by solving the following equations: \left\{ \begin{array}{l} Pr (\hat{p} < k, p = U) + \gamma P(\hat{p} = k, p = U) &= \alpha/2\\ Pr (\hat{p} > k, p = L) + \gamma P(\hat{p} = k, p = L) &= \alpha/2 \end{array} \right. where \gamma is a uniform random variable on (0, 1).


See also

*
Mixed strategy In game theory, a player's strategy is any of the options which they choose in a setting where the outcome depends ''not only'' on their own actions ''but'' on the actions of others. The discipline mainly concerns the action of a player in a game ...
*
Linear programming Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear function#As a polynomial function, li ...


Footnotes


Bibliography

* * * * * * {{cite book, last1=Young, first1=G.A., last2=Smith, first2=R.L., title=Essentials of Statistical Inference, date=2005, publisher=Cambridge University Press, location=Cambridge, isbn=9780521548663 Decision theory Statistical inference