statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

, the likelihood principle is the proposition that, given a

statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...

, all the evidence in a

sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of s ...

relevant to model parameters is contained in the

likelihood function The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...

. A likelihood function arises from a

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...

considered as a function of its distributional parameterization argument. For example, consider a model which gives the probability density function

\; f_X(x \,\vert\, \theta)\;

of observable

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

\, X \,

as a function of a parameter

\,\theta~.

Then for a specific value

\,x\,

\,X~,

the function

\,\mathcal(\theta \,\vert\, x) = f_X(x \,\vert\, \theta)\;

is a likelihood function of

\,\theta\;:~

it gives a measure of how "likely" any particular value of

\,\theta\,

is, if we know that

\,X\,

has the value

\,x~.

The density function may be a density with respect to counting measure, i.e. a

probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...

. Two likelihood functions are ''equivalent'' if one is a scalar multiple of the other. The likelihood principle is this: All information from the data that is relevant to inferences about the value of the model parameters is in the equivalence class to which the likelihood function belongs. The strong likelihood principle applies this same criterion to cases such as sequential experiments where the sample of data that is available results from applying a

stopping rule In probability theory, in particular in the study of stochastic processes, a stopping time (also Markov time, Markov moment, optional stopping time or optional time ) is a specific type of “random time”: a random variable whose value is inter ...

to the observations earlier in the experiment.

Example

Suppose *

\,X\,

is the number of successes in twelve

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...

Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is c ...

s with probability

\,\theta\,

of success on each trial, and *

\,Y\,

is the number of independent Bernoulli trials needed to get three successes, again with probability

\,\theta\,

of success on each trial (

\,\theta = \tfrac\;

for the toss of a fair coin). Then the observation that

\,X = 3\,

induces the likelihood function :

\mathcal L(\theta  \mid X=3) = \binom\,\theta^3\,(1-\theta)^9 = 220\,\theta^3\,(1-\theta)^9~,

while the observation that

\,Y = 12\,

induces the likelihood function :

\mathcal L(\theta \mid Y=12) = \binom\,\theta^3\,(1-\theta)^9 = 55\,\theta^3\,(1-\theta)^9~.

The likelihood principle says that, as the data are the same in both cases, the inferences drawn about the value of

\,\theta\,

should also be the same. In addition, all the inferential content in the data about the value of

\,\theta\,

is contained in the two likelihoods, and is the same if they are proportional to one another. This is the case in the above example, reflecting the fact that the difference between observing

\,X = 3\,

and observing

\,Y = 12\,

lies not in the actual data collected, nor in the conduct of the experimenter, but merely in the intentions described in the two different designs of the experiment. Specifically, in one case, the decision in advance was to try twelve times, regardless of the outcome; in the other case, the advance decision was to keep trying until three successes were observed. The inference about

\,\theta\,

should be the same, and this is reflected in the fact that the two likelihoods are proportional to each other: Except for a constant leading factor of 220 vs. 55, the two likelihood functions are the same. This equivalence is not always the case, however. The use of

frequentist Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pr ...

methods involving

p-values In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...

leads to different inferences for the two cases above, showing that the outcome of frequentist methods depends on the experimental procedure, and thus violates the likelihood principle.

The law of likelihood

A related concept is the law of likelihood, the notion that the extent to which the evidence supports one parameter value or hypothesis against another is indicated by the ratio of their likelihoods, their

likelihood ratio The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...

. That is, :

\Lambda =  =

is the degree to which the observation supports parameter value or hypothesis against . If this ratio is 1, the evidence is indifferent; if greater than 1, the evidence supports the value against ; or if less, then vice versa. In

Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...

, this ratio is known as the

Bayes factor The Bayes factor is a ratio of two competing statistical models represented by their marginal likelihood, and is used to quantify the support for one model over the other. The models in questions can have a common set of parameters, such as a nu ...

, and

Bayes' rule In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...

can be seen as the application of the law of likelihood to inference. In

frequentist inference Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or pro ...

, the likelihood ratio is used in the

likelihood-ratio test In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after im ...

, but other non-likelihood tests are used as well. The

Neyman–Pearson lemma In statistics, the Neyman–Pearson lemma was introduced by Jerzy Neyman and Egon Pearson in a paper in 1933. The Neyman-Pearson lemma is part of the Neyman-Pearson theory of statistical testing, which introduced concepts like errors of the second ...

states the likelihood-ratio test is equally statistically powerful as the most powerful test for comparing two simple hypotheses at a given

significance level In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the ...

, which gives a frequentist justification for the law of likelihood. Combining the likelihood principle with the law of likelihood yields the consequence that the parameter value which maximizes the likelihood function is the value which is most strongly supported by the evidence. This is the basis for the widely used method of

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...

History

The likelihood principle was first identified by that name in print in 1962 (Barnard et al., Birnbaum, and Savage et al.), but arguments for the same principle, unnamed, and the use of the principle in applications goes back to the works of

R.A. Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who ...

in the 1920s. The law of likelihood was identified by that name by I. Hacking (1965). More recently the likelihood principle as a general principle of inference has been championed by

A. W. F. Edwards Anthony William Fairbank Edwards, FRS (born 1935) is a British statistician, geneticist and evolutionary biologist. He is the son of the surgeon Harold C. Edwards, and brother of medical geneticist John H. Edwards. He has sometimes been called ...

. The likelihood principle has been applied to the

philosophy of science Philosophy of science is a branch of philosophy concerned with the foundations, methods, and implications of science. The central questions of this study concern what qualifies as science, the reliability of scientific theories, and the ultim ...

by R. Royall. Birnbaum proved that the likelihood principle follows from two more primitive and seemingly reasonable principles, the ''

conditionality principle The conditionality principle is a Fisherian principle of statistical inference that Allan Birnbaum formally defined and studied in his 1962 JASA article. Informally, the conditionality principle can be taken as the claim that experiments which we ...

'' and the ''

sufficiency principle In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample (statistics), sample provides any additional information as to ...

'': * The conditionality principle says that if an experiment is chosen by a random process independent of the states of nature

\,\theta\,,

then only the experiment actually performed is relevant to inferences about

\,\theta ~ .

* The sufficiency principle says that if

\,T(X)\,

is a

sufficient statistic In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...

for

\,\theta\,,

and if in two experiments with data

x_1

and

\,x_2\,

we have

\,T(x_1)=T(x_2)\,,

then the evidence about

\,\theta\,

given by the two experiments is the same. However, the adequacy of Birnbaum's proof is contested (''see below'').

Arguments for and against

Some widely used methods of conventional statistics, for example many

significance test A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

s, are not consistent with the likelihood principle. Let us briefly consider some of the arguments for and against the likelihood principle.

The original Birnbaum argument

Birnbaum's proof of the likelihood principle has been disputed by statisticians including Michael Evans and philosophers of science, including

Deborah Mayo Deborah G. Mayo is an American philosopher of science and author. She is a professor emerita in the Department of Philosophy at Virginia Tech and holds a visiting appointment at the Center for the Philosophy of Natural and Social Science of the L ...

. Alexander Dawid points out fundamental differences between Mayo's and Birnbaum's definitions of the conditionality principle, arguing Birnbaum's proof cannot be so readily dismissed. A new proof of the likelihood principle has been provided by Greg Gandenberger that addresses some of the counterarguments to the original proof.

Experimental design arguments on the likelihood principle

Unrealized events play a role in some common statistical methods. For example, the result of a

depends on the -value, the probability of a result as extreme or more extreme than the observation, and that probability may depend on the design of the experiment. To the extent that the likelihood principle is accepted, such methods are therefore denied. Some classical significance tests are not based on the likelihood. The following are a simple and more complicated example of those, using a commonly cited example called ''the optional stopping problem''. ;Example 1 – simple version: Suppose I tell you that I tossed a coin 12 times and in the process observed 3 heads. You might make some inference about the probability of heads and whether the coin was fair. Suppose now I tell that I tossed the coin ''until'' I observed 3 heads, and I tossed it 12 times. Will you now make some different inference? The likelihood function is the same in both cases: It is proportional to :

p^3 (1-p)^9 ~.

So according to the ''likelihood principle'', in either case the inference should be the same. ;Example 2 – a more elaborated version of the same statistics: Suppose a number of scientists are assessing the probability of a certain outcome (which we shall call 'success') in experimental trials. Conventional wisdom suggests that if there is no bias towards success or failure then the success probability would be one half. Adam, a scientist, conducted 12 trials and obtains 3 successes and 9 failures. One of those successes was the 12th and last observation. Then Adam left the lab. Bill, a colleague in the same lab, continued Adam's work and published Adam's results, along with a significance test. He tested the

null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...

that , the success probability, is equal to a half, versus . If we ignore the information that the third success was the 12th and last observation the probability of the observed result that out of 12 trials 3 or something fewer (i.e. more extreme) were successes, if is true, is :

left(\right)^ ~

which is . Thus the null hypothesis is not rejected at the 5% significance level if we ignore the knowledge that the third success was the 12th result. However observe that this first calculation also includes 12 token long sequences that end in tails contrary to the problem statement! If we redo this calculation we realize the likelihood according to the null hypothesis must be the probability of a fair coin landing 2 or fewer heads on 11 trials multiplied with the probability of the fair coin landing a head for the 12th trial: :

left(\right)^ ~

which is . Now the result ''is'' statistically significant at the level. Charlotte, another scientist, reads Bill's paper and writes a letter, saying that it is possible that Adam kept trying until he obtained 3 successes, in which case the probability of needing to conduct 12 or more experiments is given by :

left(\right)^ ~

which is . Now the result ''is'' statistically significant at the level. Note that there is no contradiction between the latter two correct analyses; both computations are correct, and result in the same p-value. To these scientists, whether a result is significant or not does not depend on the design of the experiment, but does on the likelihood (in the sense of the likelihood function) of the parameter value being . ;Summary of the illustrated issues: Results of this kind are considered by some as arguments against the likelihood principle. For others it exemplifies the value of the likelihood principle and is an argument against significance tests. Similar themes appear when comparing

Fisher's exact test Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, ...

with

Pearson's chi-squared test Pearson's chi-squared test (\chi^2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests (e.g., ...

The voltmeter story

An argument in favor of the likelihood principle is given by Edwards in his book ''Likelihood''. He cites the following story from J.W. Pratt, slightly condensed here. Note that the likelihood function depends only on what actually happened, and not on what ''could'' have happened. : An engineer draws a random sample of electron tubes and measures their voltages. The measurements range from 75 to 99 Volts. A statistician computes the sample mean and a confidence interval for the true mean. Later the statistician discovers that the voltmeter reads only as far as 100 Volts, so technically, the population appears to be “''

censored Censorship is the suppression of speech, public communication, or other information. This may be done on the basis that such material is considered objectionable, harmful, sensitive, or "inconvenient". Censorship can be conducted by governments ...

''”. If the statistician is orthodox this necessitates a new analysis. : However, the engineer says he has another meter reading to 1000 Volts, which he would have used if any voltage had been over 100. This is a relief to the statistician, because it means the population was effectively uncensored after all. But later, the statistician infers that the second meter had not been working when the measurements were taken. The engineer informs the statistician that he would not have held up the original measurements until the second meter was fixed, and the statistician informs him that new measurements are required. The engineer is astounded. “''Next you'll be asking about my oscilloscope!''” ;Throwback to ''Example 2'' in the prior section: This story can be translated to Adam's stopping rule above, as follows: Adam stopped immediately after 3 successes, because his boss Bill had instructed him to do so. After the publication of the statistical analysis by Bill, Adam realizes that he has missed a later instruction from Bill to instead conduct 12 trials, and that Bill's paper is based on this second instruction. Adam is very glad that he got his 3 successes after exactly 12 trials, and explains to his friend Charlotte that by coincidence he executed the second instruction. Later, Adam is astonished to hear about Charlotte's letter, explaining that ''now'' the result is significant.

Notes

References

* * * * * * * * * * * * {{refend

External links

* Anthony W.F. Edwards.
Likelihood
. * Jeff Miller

* John Aldrich

Estimation theory

Principle A principle is a proposition or value that is a guide for behavior or evaluation. In law, it is a Legal rule, rule that has to be or usually is to be followed. It can be desirably followed, or it can be an inevitable consequence of something, suc ...

Statistical principles ru:Принцип максимального правдоподобия