
Thompson sampling,
named after
William R. Thompson, is a
heuristic
A heuristic or heuristic technique (''problem solving'', '' mental shortcut'', ''rule of thumb'') is any approach to problem solving that employs a pragmatic method that is not fully optimized, perfected, or rationalized, but is nevertheless ...
for choosing actions that address the
exploration–exploitation dilemma
The exploration–exploitation dilemma, also known as the explore–exploit tradeoff, is a fundamental concept in decision-making that arises in many domains. It is depicted as the balancing act between two opposing strategies. Exploitation involv ...
in the
multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.
Description
Consider a set of contexts
, a set of actions
, and rewards in
. The aim of the player is to play actions under the various contexts, such as to maximize the cumulative rewards. Specifically, in each round, the player obtains a context
, plays an action
and receives a reward
following a distribution that depends on the context and the issued action.
The elements of Thompson sampling are as follows:
# a
likelihood function
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the ...
;
# a set
of parameters
of the distribution of
;
# a
prior distribution
A prior probability distribution of an uncertain quantity, simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the ...
on these parameters;
# past observations triplets
;
# a
posterior distribution
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior ...
, where
is the likelihood function.
Thompson sampling consists of playing the action
according to the probability that it maximizes the expected reward; action
is chosen with probability
:
where
is the
indicator function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...
.
In practice, the rule is implemented by sampling. In each round, parameters
are sampled from the posterior
,
and an action
chosen that maximizes