In numerical analysis, hill climbing is a
mathematical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
technique which belongs to the family of
local search. It is an
iterative algorithm
In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...
that starts with an arbitrary solution to a problem, then attempts to find a better solution by making an
incremental
Increment or incremental may refer to:
*Incrementalism, a theory (also used in politics as a synonym for gradualism)
*Increment and decrement operators, the operators ++ and -- in computer programming
*Incremental computing
*Incremental backup, wh ...
change to the solution. If the change produces a better solution, another incremental change is made to the new solution, and so on until no further improvements can be found.
For example, hill climbing can be applied to the
travelling salesman problem
The travelling salesman problem (also called the travelling salesperson problem or TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each cit ...
. It is easy to find an initial solution that visits all the cities but will likely be very poor compared to the optimal solution. The algorithm starts with such a solution and makes small improvements to it, such as switching the order in which two cities are visited. Eventually, a much shorter route is likely to be obtained.
Hill climbing finds optimal solutions for
convex
Convex or convexity may refer to:
Science and technology
* Convex lens, in optics
Mathematics
* Convex set, containing the whole line segment that joins points
** Convex polygon, a polygon which encloses a convex set of points
** Convex polytope ...
problems – for other problems it will find only
local optima
In applied mathematics and computer science, a local optimum of an optimization problem is a solution that is optimal (either maximal or minimal) within a neighboring set of candidate solutions. This is in contrast to a global optimum, which i ...
(solutions that cannot be improved upon by any neighboring configurations), which are not necessarily the best possible solution (the
global optimum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ran ...
) out of all possible solutions (the
search space). Examples of algorithms that solve convex problems by hill-climbing include the
simplex algorithm
In mathematical optimization, Dantzig's simplex algorithm (or simplex method) is a popular algorithm for linear programming.
The name of the algorithm is derived from the concept of a simplex and was suggested by T. S. Motzkin. Simplices are n ...
for
linear programming
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear function#As a polynomial function, li ...
and
binary search
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
.
To attempt to avoid getting stuck in local optima, one could use restarts (i.e. repeated local search), or more complex schemes based on iterations (like
iterated local search
Iterated Local Search (ILS) is a term in applied mathematics and computer science
defining a modification of local search or hill climbing methods for solving discrete optimization problems.
Local search methods can get stuck in a local minimum, ...
), or on memory (like reactive search optimization and
tabu search Tabu search is a metaheuristic search method employing local search methods used for mathematical optimization. It was created by Fred W. Glover in 1986
and formalized in 1989.
Local (neighborhood) searches take a potential solution to a prob ...
), or on memory-less stochastic modifications (like
simulated annealing
Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. It ...
).
The relative simplicity of the algorithm makes it a popular first choice amongst optimizing algorithms. It is used widely in
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
, for reaching a goal state from a starting node. Different choices for next nodes and starting nodes are used in related algorithms. Although more advanced algorithms such as
simulated annealing
Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. It ...
or
tabu search Tabu search is a metaheuristic search method employing local search methods used for mathematical optimization. It was created by Fred W. Glover in 1986
and formalized in 1989.
Local (neighborhood) searches take a potential solution to a prob ...
may give better results, in some situations hill climbing works just as well. Hill climbing can often produce a better result than other algorithms when the amount of time available to perform a search is limited, such as with real-time systems, so long as a small number of increments typically converges on a good solution (the optimal solution or a close approximation). At the other extreme,
bubble sort
Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the input list element by element, comparing the current element with the one after it, swapping their values if needed. These passes ...
can be viewed as a hill climbing algorithm (every adjacent element exchange decreases the number of disordered element pairs), yet this approach is far from efficient for even modest N, as the number of exchanges required grows quadratically.
Hill climbing is an
anytime algorithm
In computer science, an anytime algorithm is an algorithm that can return a valid solution to a problem even if it is interrupted before it ends. The algorithm is expected to find better and better solutions the longer it keeps running.
Most algo ...
: it can return a valid solution even if it's interrupted at any time before it ends.
Mathematical description
Hill climbing attempts to maximize (or minimize) a target
function
Function or functionality may refer to:
Computing
* Function key, a type of key on computer keyboards
* Function model, a structured representation of processes in a system
* Function object or functor or functionoid, a concept of object-oriente ...
, where
is a vector of continuous and/or discrete values. At each iteration, hill climbing will adjust a single element in
and determine whether the change improves the value of
. (Note that this differs from
gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
methods, which adjust all of the values in
at each iteration according to the gradient of the hill.) With hill climbing, any change that improves
is accepted, and the process continues until no change can be found to improve the value of
. Then
is said to be "locally optimal".
In discrete vector spaces, each possible value for
may be visualized as a
vertex
Vertex, vertices or vertexes may refer to:
Science and technology Mathematics and computer science
*Vertex (geometry), a point where two or more curves, lines, or edges meet
* Vertex (computer graphics), a data structure that describes the positio ...
in a
graph
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of discre ...
. Hill climbing will follow the graph from vertex to vertex, always locally increasing (or decreasing) the value of
, until a
local maximum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ran ...
(or
local minimum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ran ...
)
is reached.
Variants
In simple hill climbing, the first closer node is chosen, whereas in steepest ascent hill climbing all successors are compared and the closest to the solution is chosen. Both forms fail if there is no closer node, which may happen if there are local maxima in the search space which are not solutions. Steepest ascent hill climbing is similar to
best-first search, which tries all possible extensions of the current path instead of only one.
Stochastic hill climbing does not examine all neighbors before deciding how to move. Rather, it selects a neighbor at random, and decides (based on the amount of improvement in that neighbor) whether to move to that neighbor or to examine another.
Coordinate descent Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function. At each iteration, the algorithm determines a coordinate or coordinate block via a coordinate selection rule, ...
does a
line search
In optimization, the line search strategy is one of two basic iterative approaches to find a local minimum \mathbf^* of an objective function f:\mathbb R^n\to\mathbb R. The other approach is trust region.
The line search approach first finds a ...
along one coordinate direction at the current point in each iteration. Some versions of coordinate descent randomly pick a different coordinate direction each iteration.
Random-restart hill climbing is a
meta-algorithm built on top of the hill climbing algorithm. It is also known as Shotgun hill climbing. It iteratively does hill-climbing, each time with a random initial condition
. The best
is kept: if a new run of hill climbing produces a better
than the stored state, it replaces the stored state.
Random-restart hill climbing is a surprisingly effective algorithm in many cases. It turns out that it is often better to spend CPU time exploring the space, than carefully optimizing from an initial condition.
Problems
Local maxima
Hill climbing will not necessarily find the global maximum, but may instead converge on a
local maximum
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given ran ...
. This problem does not occur if the heuristic is convex. However, as many functions are not convex hill climbing may often fail to reach a global maximum. Other local search algorithms try to overcome this problem such as
stochastic hill climbing,
random walk
In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space.
An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
s and
simulated annealing
Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. It ...
.
Ridges and alleys
Ridges
A ridge or a mountain ridge is a geographical feature consisting of a chain of mountains or hills that form a continuous elevated crest for an extended distance. The sides of the ridge slope away from the narrow top on either side. The line ...
are a challenging problem for hill climbers that optimize in continuous spaces. Because hill climbers only adjust one element in the vector at a time, each step will move in an axis-aligned direction. If the target function creates a narrow ridge that ascends in a non-axis-aligned direction (or if the goal is to minimize, a narrow alley that descends in a non-axis-aligned direction), then the hill climber can only ascend the ridge (or descend the alley) by zig-zagging. If the sides of the ridge (or alley) are very steep, then the hill climber may be forced to take very tiny steps as it zig-zags toward a better position. Thus, it may take an unreasonable length of time for it to ascend the ridge (or descend the alley).
By contrast, gradient descent methods can move in any direction that the ridge or alley may ascend or descend. Hence, gradient descent or the
conjugate gradient method
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterativ ...
is generally preferred over hill climbing when the target function is differentiable. Hill climbers, however, have the advantage of not requiring the target function to be differentiable, so hill climbers may be preferred when the target function is complex.
Plateau
Another problem that sometimes occurs with hill climbing is that of a plateau. A plateau is encountered when the search space is flat, or sufficiently flat that the value returned by the target function is indistinguishable from the value returned for nearby regions due to the precision used by the machine to represent its value. In such cases, the hill climber may not be able to determine in which direction it should step, and may wander in a direction that never leads to improvement.
Pseudocode
algorithm Discrete Space Hill Climbing is
currentNode := startNode
loop do
L := NEIGHBORS(currentNode)
nextEval := −INF
nextNode := NULL
for all x in L do
if EVAL(x) > nextEval then
nextNode := x
nextEval := EVAL(x)
if nextEval ≤ EVAL(currentNode) then
// Return current node since no better neighbors exist
return currentNode
currentNode := nextNode
algorithm Continuous Space Hill Climbing is
currentPoint := initialPoint // the zero-magnitude vector is common
stepSize := initialStepSizes // a vector of all 1's is common
acceleration := someAcceleration // a value such as 1.2 is common
candidate
:= −acceleration
candidate
:= −1 / acceleration
candidate
:= 1 / acceleration
candidate
:= acceleration
bestScore := EVAL(currentPoint)
loop do
beforeScore := bestScore
for each element i in currentPoint do
beforePoint := currentPoint
bestStep := 0
for j from 0 to 3 do // try each of 4 candidate locations
step := stepSize
× candidate
currentPoint
:= beforePoint + step
score := EVAL(currentPoint)
if score > bestScore then
bestScore := score
bestStep := step
if bestStep is 0 then
currentPoint
:= beforePoint
stepSize
:= stepSize
/ acceleration
else
currentPoint
:= beforePoint + bestStep
stepSize
:= bestStep // acceleration
if (bestScore − beforeScore) < epsilon then
return currentPoint
Contrast
genetic algorithm
In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to gene ...
;
random optimization Random optimization (RO) is a family of numerical optimization methods that do not require the gradient of the problem to be optimized and RO can hence be used on functions that are not continuous or differentiable. Such optimization methods are al ...
.
See also
*
Gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
*
Greedy algorithm
A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
*
Tâtonnement
*
Mean-shift
Mean shift is a non-parametric feature-space mathematical analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster analysis in computer vision and image processing. ...
*
A* search algorithm
A* (pronounced "A-star") is a graph traversal and path search algorithm, which is used in many fields of computer science due to its completeness, optimality, and optimal efficiency. One major practical drawback is its O(b^d) space complexity, ...
References
*
Further reading
*
External links
*
{{Optimization algorithms
Metaheuristics
Search algorithms
Articles with example pseudocode