Empirical probability
   HOME

TheInfoList



OR:

In
probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
and
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the empirical probability, relative frequency, or experimental probability of an event is the ratio of the number of outcomes in which a specified event occurs to the total number of trials, i.e. by means not of a theoretical
sample space In probability theory, the sample space (also called sample description space, possibility space, or outcome space) of an experiment or random trial is the set of all possible outcomes or results of that experiment. A sample space is usually den ...
but of an actual
experiment An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs whe ...
. More generally, empirical
probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
estimates probabilities from
experience Experience refers to Consciousness, conscious events in general, more specifically to perceptions, or to the practical knowledge and familiarity that is produced by these processes. Understood as a conscious event in the widest sense, experience i ...
and
observation Observation in the natural sciences is an act or instance of noticing or perceiving and the acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the percep ...
. Given an event in a sample space, the relative
frequency Frequency is the number of occurrences of a repeating event per unit of time. Frequency is an important parameter used in science and engineering to specify the rate of oscillatory and vibratory phenomena, such as mechanical vibrations, audio ...
of is the ratio being the number of outcomes in which the event occurs, and being the total number of outcomes of the experiment. In statistical terms, the
empirical Empirical evidence is evidence obtained through sense experience or experimental procedure. It is of central importance to the sciences and plays a role in various other fields, like epistemology and law. There is no general agreement on how t ...
probability is an ''
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on Sample (statistics), observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguish ...
'' or ''
estimate Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...
'' of a probability. In simple cases, where the result of a trial only determines whether or not the specified event has occurred, modelling using a
binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
might be appropriate and then the empirical estimate is the
maximum likelihood estimate In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
. It is the Bayesian estimate for the same case if certain assumptions are made for the
prior distribution A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
of the probability. If a trial yields more information, the empirical probability can be improved on by adopting further assumptions in the form of a
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
: if such a model is fitted, it can be used to derive an estimate of the probability of the specified event


Advantages and disadvantages


Advantages

An advantage of estimating probabilities using empirical probabilities is that this procedure is relatively free of assumptions. For example, consider estimating the probability among a population of men that they satisfy two conditions: # that they are over 6
feet The foot (: feet) is an anatomical structure found in many vertebrates. It is the terminal portion of a limb which bears weight and allows locomotion. In many animals with feet, the foot is an organ at the terminal part of the leg made up of ...
in height. # that they prefer strawberry jam to raspberry jam. A direct estimate could be found by counting the number of men who satisfy both conditions to give the empirical probability of the combined condition. An alternative estimate could be found by multiplying the proportion of men who are over 6 feet in height with the proportion of men who prefer strawberry jam to raspberry jam, but this estimate relies on the assumption that the two conditions are
statistically independent Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two event (probability theory), events are independent, statistically independent, or stochastically independent if, informally s ...
.


Disadvantages

A disadvantage in using empirical probabilities arises in estimating probabilities which are either very close to zero, or very close to one. In these cases very large sample sizes would be needed in order to estimate such probabilities to a good standard of relative accuracy. Here
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
s can help, depending on the context, and in general one can hope that such models would provide improvements in accuracy compared to empirical probabilities, provided that the assumptions involved actually do hold. For example, consider estimating the probability that the lowest of the daily-maximum temperatures at a site in February in any one year is less than zero degrees Celsius. A record of such temperatures in past years could be used to estimate this probability. A model-based alternative would be to select a family of
probability distributions In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spac ...
and fit it to the dataset containing past years′ values. The fitted distribution would provide an alternative estimate of the desired probability. This alternative method can provide an estimate of the probability even if all values in the record are greater than zero.


Mixed nomenclature

The phrase ''a-posteriori probability'' is also used as an alternative to "empirical probability" or "relative frequency". The use of the phrase "a-posteriori" is reminiscent of terms in
Bayesian statistics Bayesian statistics ( or ) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about ...
, but is not directly related to
Bayesian inference Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...
, where ''a-posteriori probability'' is occasionally used to refer to
posterior probability The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posteri ...
, which is different even though it has a confusingly similar name. The term ''a-posteriori probability'', in its meaning suggestive of "empirical probability", may be used in conjunction with ''
a priori probability A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
'' which represents an estimate of a probability not based on any observations, but based on
deductive reasoning Deductive reasoning is the process of drawing valid inferences. An inference is valid if its conclusion follows logically from its premises, meaning that it is impossible for the premises to be true and the conclusion to be false. For example, t ...
.
available online
)


See also

*
Empirical distribution function In statistics, an empirical distribution function ( an empirical cumulative distribution function, eCDF) is the Cumulative distribution function, distribution function associated with the empirical measure of a Sampling (statistics), sample. Th ...
*
Empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
*
Estimating quantiles from a sample Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...
*
Frequency probability Frequentist probability or frequentism is an interpretation of probability; it defines an event's probability (the ''long-run probability'') as the limit of a sequence, limit of its Empirical probability, relative frequency in infinitely many E ...


References

{{Reflist Applied probability Observational study Estimation theory