HOME

TheInfoList



OR:

The cross-entropy (CE) method is a
Monte Carlo Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is ...
method for importance sampling and
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
. It is applicable to both
combinatorial Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ap ...
and
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous g ...
problems, with either a static or noisy objective. The method approximates the optimal importance sampling estimator by repeating two phases:Rubinstein, R.Y. and Kroese, D.P. (2004), The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, Springer-Verlag, New York . #Draw a sample from a probability distribution. #Minimize the ''cross-entropy'' between this distribution and a target distribution to produce a better sample in the next iteration.
Reuven Rubinstein Reuven Rubinstein (1938-2012)( he, ראובן רובינשטיין) was an Israeli scientist known for his contributions to Monte Carlo simulation, applied probability, stochastic modeling and stochastic optimization, having authored more than on ...
developed the method in the context of ''rare event simulation'', where tiny probabilities must be estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems. The method has also been applied to the traveling salesman, quadratic assignment, DNA sequence alignment, max-cut and buffer allocation problems.


Estimation via importance sampling

Consider the general problem of estimating the quantity \ell = \mathbb_ (\mathbf)= \int H(\mathbf)\, f(\mathbf; \mathbf)\, \textrm\mathbf, where H is some ''performance function'' and f(\mathbf;\mathbf) is a member of some
parametric family In mathematics and its applications, a parametric family or a parameterized family is a family of objects (a set of related objects) whose differences depend only on the chosen values for a set of parameters. Common examples are parametrized (fam ...
of distributions. Using importance sampling this quantity can be estimated as \hat = \frac \sum_^N H(\mathbf_i) \frac, where \mathbf_1,\dots,\mathbf_N is a random sample from g\,. For positive H, the theoretically ''optimal'' importance sampling
density Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematicall ...
(PDF) is given by g^*(\mathbf) = H(\mathbf) f(\mathbf;\mathbf)/\ell. This, however, depends on the unknown \ell. The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the Kullback–Leibler sense) to the optimal PDF g^*.


Generic CE algorithm

# Choose initial parameter vector \mathbf^; set t = 1. # Generate a random sample \mathbf_1,\dots,\mathbf_N from f(\cdot;\mathbf^) # Solve for \mathbf^, where
\mathbf^ = \mathop_ \frac \sum_^N H(\mathbf_i)\frac \log f(\mathbf_i;\mathbf) # If convergence is reached then stop; otherwise, increase t by 1 and reiterate from step 2. In several cases, the solution to step 3 can be found ''analytically''. Situations in which this occurs are * When f\, belongs to the
natural exponential family In probability and statistics, a natural exponential family (NEF) is a class of probability distributions that is a special case of an exponential family (EF). Definition Univariate case The natural exponential families (NEF) are a subset of ...
* When f\, is discrete with finite
support Support may refer to: Arts, entertainment, and media * Supporting character Business and finance * Support (technical analysis) * Child support * Customer support * Income Support Construction * Support (structure), or lateral support, a ...
* When H(\mathbf) = \mathrm_ and f(\mathbf_i;\mathbf) = f(\mathbf_i;\mathbf^), then \mathbf^ corresponds to the
maximum likelihood estimator In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...
based on those \mathbf_k \in A.


Continuous optimization—example

The same CE algorithm can be used for optimization, rather than estimation. Suppose the problem is to maximize some function S, for example, S(x) = \textrm^ + 0.8\,\textrm^. To apply CE, one considers first the ''associated stochastic problem'' of estimating \mathbb_(S(X)\geq\gamma) for a given ''level'' \gamma\,, and parametric family \left\, for example the 1-dimensional
Gaussian distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
, parameterized by its mean \mu_t\, and variance \sigma_t^2 (so \boldsymbol = (\mu,\sigma^2) here). Hence, for a given \gamma\,, the goal is to find \boldsymbol so that D_(\textrm_\, f_) is minimized. This is done by solving the sample version (stochastic counterpart) of the KL divergence minimization problem, as in step 3 above. It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and parametric family are the sample mean and sample variance corresponding to the ''elite samples'', which are those samples that have objective function value \geq\gamma. The worst of the elite samples is then used as the level parameter for the next iteration. This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an estimation of distribution algorithm.


Pseudocode

''// Initialize parameters'' μ := −6 σ2 := 100 t := 0 maxits := 100 N := 100 Ne := 10 ''// While maxits not exceeded and not converged'' while t < maxits and σ2 > ε do ''// Obtain N samples from current sampling distribution'' X := SampleGaussian(μ, σ2, N) ''// Evaluate objective function at sampled points'' S := exp(−(X − 2) ^ 2) + 0.8 exp(−(X + 2) ^ 2) ''// Sort X by objective function values in descending order'' X := sort(X, S) ''// Update parameters of sampling distribution'' μ := mean(X(1:Ne)) σ2 := var(X(1:Ne)) t := t + 1 ''// Return mean of final sampling distribution as solution'' return μ


Related methods

*
Simulated annealing Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. ...
*
Genetic algorithms In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to gene ...
*
Harmony search This is a chronologically ordered list of metaphor-based metaheuristics and swarm intelligence algorithms, sorted by decade of proposal. Algorithms 1980s-1990s Simulated annealing (Kirkpatrick et al., 1983) Simulated annealing is a pro ...
* Estimation of distribution algorithm *
Tabu search Tabu search is a metaheuristic search method employing local search methods used for mathematical optimization. It was created by Fred W. Glover in 1986 and formalized in 1989. Local (neighborhood) searches take a potential solution to a pro ...
*
Natural Evolution Strategy Natural evolution strategies (NES) are a family of numerical optimization algorithms for black box problems. Similar in spirit to evolution strategies, they iteratively update the (continuous) parameters of a ''search distribution'' by following t ...


See also

*
Cross entropy In information theory, the cross-entropy between two probability distributions p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is ...
*
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fr ...
*
Randomized algorithm A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performa ...
* Importance sampling


Journal papers

* De Boer, P-T., Kroese, D.P, Mannor, S. and Rubinstein, R.Y. (2005). A Tutorial on the Cross-Entropy Method. ''Annals of Operations Research'', 134 (1), 19–6

*Rubinstein, R.Y. (1997). Optimization of Computer simulation Models with Rare Events, ''European Journal of Operational Research'', 99, 89–112.


Software implementations


CEoptim R package

Novacta.Analytics .NET library


References

{{reflist Heuristics Optimization algorithms and methods Monte Carlo methods Machine learning