HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the reference class problem is the problem of deciding what class to use when calculating the
probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
applicable to a particular case. For example, to estimate the probability of an aircraft crashing, we could refer to the frequency of crashes among various different sets of aircraft: all aircraft, this make of aircraft, aircraft flown by this company in the last ten years, etc. In this example, the aircraft for which we wish to calculate the probability of a crash is a member of many different classes, in which the frequency of crashes differs. It is not obvious which class we should refer to for this aircraft. In general, any case is a member of very many classes among which the frequency of the attribute of interest differs. The reference class problem discusses which class is the most appropriate to use. More formally, many arguments in statistics take the form of a statistical syllogism: #X proportion of F are G #I is an F #Therefore, the chance that I is a G is X F is called the "reference class" and G is the "attribute class" and I is the individual object. How is one to choose an appropriate class F? In
Bayesian statistics Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
, the problem arises as that of deciding on a
prior probability A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
for the outcome in question (or when considering multiple outcomes, a prior probability distribution).


History

John Venn John Venn, Fellow of the Royal Society, FRS, Fellow of the Society of Antiquaries of London, FSA (4 August 1834 – 4 April 1923) was an English mathematician, logician and philosopher noted for introducing Venn diagrams, which are used in l ...
stated in 1876 that "every single thing or event has an indefinite number of properties or attributes observable in it, and might therefore be considered as belonging to an indefinite number of different classes of things", leading to problems with how to assign probabilities to a single case. He used as an example the probability that John Smith, a consumptive Englishman aged fifty, will live to sixty-one. The name "problem of the reference class" was given by
Hans Reichenbach Hans Reichenbach (; ; September 26, 1891 – April 9, 1953) was a leading philosopher of science, educator, and proponent of logical empiricism. He was influential in the areas of science, education, and of logical empiricism. He founded the ''G ...
, who wrote, "If we are asked to find the probability holding for an individual future event, we must first incorporate the event into a suitable reference class. An individual thing or event may be incorporated in many reference classes, from which different probabilities will result." There has also been discussion of the reference class problem in philosophy and in the
life science Life, also known as biota, refers to matter that has biological processes, such as signaling and self-sustaining processes. It is defined descriptively by the capacity for homeostasis, organisation, metabolism, growth, adaptation, respon ...
s, e.g., clinical trial prediction.


Legal applications

Applying
Bayesian probability Bayesian probability ( or ) is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quant ...
in practice involves assessing a
prior probability A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
which is then applied to a
likelihood function A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
and updated through the use of
Bayes' theorem Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...
. Suppose we wish to assess the probability of guilt of a defendant in a court case in which DNA (or other probabilistic) evidence is available. We first need to assess the prior probability of guilt of the defendant. We could say that the crime occurred in a city of 1,000,000 people, of whom 15% meet the requirements of being the same sex, age group and approximate description as the perpetrator. That suggests a prior probability of guilt of 1 in 150,000. We could cast the net wider and say that there is, say, a 25% chance that the perpetrator is from out of town, but still from this country, and construct a different prior estimate. We could say that the perpetrator could come from anywhere in the world, and so on. Legal theorists have discussed the reference class problem particularly with reference to the Shonubi case. Charles Shonubi, a Nigerian drug smuggler, was arrested at JFK Airport on Dec 10, 1991, and convicted of
heroin Heroin, also known as diacetylmorphine and diamorphine among other names, is a morphinan opioid substance synthesized from the Opium, dried latex of the Papaver somniferum, opium poppy; it is mainly used as a recreational drug for its eupho ...
importation. The severity of his sentence depended not only on the amount of drugs on that trip, but the total amount of drugs he was estimated to have imported on seven previous occasions on which he was not caught. Five separate legal cases debated how that amount should be estimated. In one case, "Shonubi III", the prosecution presented statistical evidence of the amount of drugs found on Nigerian drug smugglers caught at JFK Airport in the period between Shonubi's first and last trips. There has been debate over whether that is the (or a) correct reference class to use, and if so, why. Other legal applications involve valuation. For example, houses might be valued using the data in a database of house sales of "similar" houses. To decide on which houses are similar to a given one, one needs to know which features of a house are relevant to price. Number of bathrooms might be relevant, but not the eye color of the owner. It has been argued that such reference class problems can be solved by finding which features are relevant: a feature is relevant to house price if house price covaries with it (it affects the likelihood that the house has a higher or lower value), and the ideal reference class for an individual is the set of all instances which share with it all relevant features.


See also

* Statistical syllogism *
Reference class forecasting Reference class forecasting or comparison class forecasting is a method of predicting the future by looking at similar past situations and their outcomes. The theories behind reference class forecasting were developed by Daniel Kahneman and Amos ...
* Spectrum bias *
Simpson's paradox Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science st ...


References

{{Reflist Bayesian statistics Logic and statistics Forensic statistics