decision theory Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...

, the expected value of sample information (EVSI) is the expected increase in utility that a decision-maker could obtain from gaining access to a

sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of s ...

of additional observations before making a decision. The additional information obtained from the

may allow them to make a more informed, and thus better, decision, thus resulting in an increase in expected utility. EVSI attempts to estimate what this improvement would be before seeing actual sample data; hence, EVSI is a form of what is known as ''preposterior analysis''. The use of EVSI in decision theory was popularized by

Robert Schlaifer Robert Osher Schlaifer (13 September 1914 – 24 July 1994) was a pioneer of Bayesian decision theory. At the time of his death he was William Ziegler Professor of Business Administration Emeritus of the Harvard Business School. In 1961 he was ...

and

Howard Raiffa Howard Raiffa (; January 24, 1924 – July 8, 2016) was an American academic who was the Frank P. Ramsey Professor (Emeritus) of Managerial Economics, a joint chair held by the Business School and Harvard Kennedy School at Harvard University. He w ...

in the 1960s.

Formulation

Let :

\begin
d\in D & \mbox D 
\\
x\in X & \mbox X
\\
z \in Z & \mbox n \mbox \langle z_1,z_2,..,z_n \rangle
\\
U(d,x) & \mbox d \mbox x
\\
p(x) & \mbox x
\\
p(z, x) & \mbox z
\end

It is common (but not essential) in EVSI scenarios for

Z_i=X

p(z, x)=\prod p(z_i, x)

and

\int z p(z, x) dz = x

, which is to say that each observation is an unbiased sensor reading of the underlying state

x

, with each sensor reading being independent and identically distributed. The utility from the optimal decision based only on the prior, without making any further observations, is given by :

= \max_ ~ \int_X U(d,x) p(x) ~ dx.

If the decision-maker could gain access to a single sample,

z

, the optimal posterior utility would be :

= \max_ ~ \int_X U(d,x) p(x, z) ~ dx

where

p(x, z)

is obtained from

Bayes' rule In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For examp ...

: :

p(x, z)  = ;

p(z)  = \int p(z, x) p(x) ~ dx.

Since they don't know what sample would actually be obtained if one were obtained, they must average over all possible samples to obtain the expected utility given a sample: :

p(z) dz = \int_Z \max_ ~ \int_X U(d,x) p(z, x) p(x) ~ dx ~ dz.

The expected value of sample information is then defined as :

\\ & = \left(\int_Z \max_ ~ \int_X U(d,x) p(z, x) p(x) ~ dx ~ dz\right) - \left(\max_ ~ \int_X U(d,x) p(x) ~ dx\right). \end

Computation

It is seldom feasible to carry out the integration over the space of possible observations in E SIanalytically, so the computation of EVSI usually requires a

Monte Carlo simulation Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determini ...

. The method involves randomly simulating a sample,

z^i=\langle z^i_1,z^i_2,..,z^i_n\rangle

, then using it to compute the posterior

p(x, z^i)

and maximizing utility based on

p(x, z^i)

. This whole process is then repeated many times, for

i=1,..,M

to obtain a Monte Carlo sample of optimal utilities. These are averaged to obtain the expected utility given a hypothetical sample.

Example

A regulatory agency is to decide whether to approve a new treatment. Before making the final approve/reject decision, they ask what the value would be of conducting a further trial study on

n

subjects. This question is answered by the EVSI. EVSI diagram

The diagram shows an

influence diagram Influence or influencer may refer to: *Social influence, in social psychology, influence in interpersonal relationships **Minority influence, when the minority affect the behavior or beliefs of the majority *Influencer marketing, through individu ...

for computing the EVSI in this example. The model classifies the outcome for any given subject into one of five categories: :

Z_i =

And for each of these outcomes, assigns a utility equal to an estimated patient-equivalent monetary value of the outcome. A decision state,

x

in this example is a vector of five numbers between 0 and 1 that sum to 1, giving the proportion of future patients that will experience each of the five possible outcomes. For example, a state

x= \%,60\%,20\%,10\%,5\% /math> denotes the case where 5% of patients are cured, 60% improve, 20% find the treatment ineffective, 10% experience mild side-effects and 5% experience dangerous side-effects.

The prior, p(x) is encoded using a

Dirichlet distribution In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector \boldsymb ...

, requiring five numbers (that don't sum to 1) whose relative values capture the expected relative proportion of each outcome, and whose sum encodes the strength of this prior belief. In the diagram, the parameters of the

are contained in the variable ''dirichlet alpha prior'', while the prior distribution itself is in the chance variable ''Prior''. The probability density graph of the marginals is shown here: In the chance variable ''Trial data'', trial data is simulated as a Monte Carlo sample from a

Multinomial distribution In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a ''k''-sided dice rolled ''n'' times. For ''n'' independent trials each of w ...

. For example, when Trial_size=100, each Monte Carlo sample of ''Trial_data'' contains a vector that sums to 100 showing the number of subjects in the simulated study that experienced each of the five possible outcomes. The following result table depicts the first 8 simulated trial outcomes: Combining this trial data with a Dirichlet prior requires only adding the outcome frequencies to the Dirichlet prior alpha values, resulting in a Dirichlet posterior distribution for each simulated trial. For each of these, the decision to approve is made based on whether the mean utility is positive, and using a utility of zero when the treatment is not approved, the ''Pre-posterior utility is obtained''. Repeating the computation for a range of possible trial sizes, an EVSI is obtained at each possible candidate trial size as depicted in this graph:

Comparison to related measures

Expected value of sample information (EVSI) is a relaxation of the

expected value of perfect information In decision theory, the expected value of perfect information (EVPI) is the price that one would be willing to pay in order to gain access to perfect information. A common discipline that uses the EVPI concept is health economics. In that context ...

(EVPI) metric, which encodes the increase of utility that would be obtained if one were to learn the true underlying state,

x

. Essentially EVPI indicates the value of perfect information, while EVSI indicates the value of ''some limited and incomplete'' information. The

expected value of including uncertainty In decision theory and quantitative policy analysis, the expected value of including uncertainty (EVIU) is the expected difference in the value of a decision based on a probabilistic analysis versus a decision based on an analysis that ignores uncer ...

(EVIU) compares the value of modeling uncertain information as compared to modeling a situation without taking uncertainty into account. Since the impact of uncertainty on computed results is often analysed using

Monte Carlo methods Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determini ...

, EVIU appears to be very similar to ''the value of carrying out an analysis using a Monte Carlo sample'', which closely resembles in statement the notion captured with EVSI. However, EVSI and EVIU are quite distinct—a notable difference between the manner in which EVSI uses

Bayesian updating Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and e ...

to incorporate the simulated sample.

Formulation

Computation

Example

Comparison to related measures

See also

References

Further reading