In
statistical hypothesis testing
A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. T ...
, e-values quantify the evidence in the data against a
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
(e.g., "the coin is fair", or, in a medical context, "this new treatment has no effect"). They serve as a more robust alternative to
p-value
In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...
s, addressing some shortcomings of the latter.
In contrast to p-values, e-values can deal with optional continuation: e-values of subsequent experiments (e.g. clinical trials concerning the same treatment) may simply be multiplied to provide a new, "product" e-value that represents the evidence in the joint experiment. This works even if, as often happens in practice, the decision to perform later experiments may depend in vague, unknown ways on the data observed in earlier experiments, and it is not known beforehand how many trials will be conducted: the product e-value remains a meaningful quantity, leading to tests with
Type-I error control. For this reason, e-values and their sequential extension, the ''e-process'', are the fundamental building blocks for anytime-valid statistical methods (e.g. confidence sequences). Another advantage over p-values is that any weighted average of e-values remains an e-value, even if the individual e-values are arbitrarily dependent. This is one of the reasons why e-values have also turned out to be useful tools in
multiple testing
Multiple comparisons, multiplicity or multiple testing problem occurs in statistics when one considers a set of statistical inferences simultaneously or estimates a subset of parameters selected based on the observed values.
The larger the number ...
.
E-values can be interpreted in a number of different ways: first, an e-value can be interpreted as rescaling of a test that is presented on a more appropriate scale that facilitates merging them.
Second, the reciprocal of an e-value is a p-value, but not just any p-value: a special p-value for which a rejection `at level p' retains a generalized Type-I error guarantee.
Third, they are broad generalizations of
likelihood ratios and are also related to, yet distinct from,
Bayes factors
The Bayes factor is a ratio of two competing statistical models represented by their marginal likelihood, evidence, and is used to quantify the support for one model over the other. The models in question can have a common set of parameters, such ...
. Fourth, they have an interpretation as bets. Fifth, in a sequential context, they can also be interpreted as increments of nonnegative
supermartingales. Interest in e-values has exploded since 2019, when the term 'e-value' was coined and a number of breakthrough results were achieved by several research groups. The first overview article appeared in 2023.
Definition and mathematical background
Let the
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
be given as a set of distributions for data
. Usually
with each
a single outcome and
a fixed sample size or some stopping time. We shall refer to such
, which represent the full sequence of outcomes of a statistical experiment, as a ''sample'' or ''batch of outcomes.'' But in some cases
may also be an unordered bag of outcomes or a single outcome.
An e-variable or e-statistic is a ''nonnegative'' random variable
such that under all
, its expected value is bounded by 1:
.
The value taken by e-variable
is called the e-value''.'' In practice, the term ''e-value'' (a number) is often used when one is really referring to the underlying e-variable (a random variable, that is, a measurable function of the data).
Interpretations
As the continuous interpretation of a test
A test for a null hypothesis
is traditionally modeled as a function
from the data to
. A test
is said to be valid for level
if
This is classically conveniently summarized as a function
from the data to
that satisfies
.
Moreover, this is sometimes generalized to permit external randomization by letting the test
take value in