# Hill climbing

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Hill_climbing
> Markdown URL: https://mediated.wiki/source/Hill_climbing.md
> Source: https://en.wikipedia.org/wiki/Hill_climbing
> Source revision: 1325873101
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Optimization algorithm

This article is about the mathematical algorithm. For other meanings see [Hillclimbing](/source/Hillclimbing) or [Hillclimbing (disambiguation)](/source/Hillclimbing_(disambiguation))

This article needs more citations. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Hill climbing" – news · newspapers · books · scholar · JSTOR (April 2017) (Learn how and when to remove this message)

A surface with only one maximum. Hill-climbing techniques are well-suited for optimizing over such surfaces, and will converge to the global maximum.

In [numerical analysis](/source/Numerical_analysis), **hill climbing** is a [mathematical optimization](/source/Optimization_(mathematics)) technique which belongs to the family of [local search](/source/Local_search_(optimization)).

It is an [iterative algorithm](/source/Iterative_algorithm) that starts with an arbitrary solution to a problem, then attempts to find a better solution by making an [incremental](/source/Incremental_heuristic_search) change to the solution. If the change produces a better solution, another incremental change is made to the new solution, and so on until no further improvements can be found.

For example, hill climbing can be applied to the [travelling salesman problem](/source/Travelling_salesman_problem). It is easy to find an initial solution that visits all the cities but will likely be very poor compared to the optimal solution. The algorithm starts with such a solution and makes small improvements to it, such as switching the order in which two cities are visited. Eventually, a much shorter route is likely to be obtained.

Hill climbing finds optimal solutions for [convex](/source/Convex_optimization) problems – for other problems it will find only [local optima](/source/Local_optimum) (solutions that cannot be improved upon by any neighboring configurations), which are not necessarily the best possible solution (the [global optimum](/source/Global_optimum)) out of all possible solutions (the [search space](/source/Candidate_solution)).

Examples of algorithms that solve [convex problems](/source/Convex_optimization) by hill-climbing include the [simplex algorithm](/source/Simplex_algorithm) for [linear programming](/source/Linear_programming) and [binary search](/source/Binary_search).[1]: 253

To attempt to avoid getting stuck in local optima, one could use restarts (i.e. repeated local search), or more complex schemes based on iterations (like [iterated local search](/source/Iterated_local_search)), or on memory (like reactive search optimization and [tabu search](/source/Tabu_search)), or on memory-less stochastic modifications (like [simulated annealing](/source/Simulated_annealing)).

The relative simplicity of the algorithm makes it a popular first choice amongst optimizing algorithms. It is used widely in [artificial intelligence](/source/Artificial_intelligence), for reaching a goal state from a starting node. Different choices for next [nodes](/source/Vertex_(graph_theory)) and starting nodes are used in related algorithms. Although more advanced algorithms such as [simulated annealing](/source/Simulated_annealing) or [tabu search](/source/Tabu_search) may give better results, in some situations hill climbing works just as well. Hill climbing can often produce a better result than other algorithms when the amount of time available to perform a search is limited, such as with real-time systems, so long as a small number of increments typically converges on a good solution (the optimal solution or a close approximation). At the other extreme, [bubble sort](/source/Bubble_sort) can be viewed as a hill climbing algorithm (every adjacent element exchange decreases the number of disordered element pairs), yet this approach is far from efficient for even modest N, as the number of exchanges required grows quadratically.

Hill climbing is an [anytime algorithm](/source/Anytime_algorithm): it can return a valid solution even if it's interrupted at any time before it ends.

## Mathematical description

Hill climbing attempts to maximize (or minimize) a target [function](/source/Function_(mathematics)) f ( x ) {\displaystyle f(\mathbf {x} )} , where x {\displaystyle \mathbf {x} } is a vector of continuous and/or discrete values. At each iteration, hill climbing will adjust a single element in x {\displaystyle \mathbf {x} } and determine whether the change improves the value of f ( x ) {\displaystyle f(\mathbf {x} )} . (Note that this differs from [gradient descent](/source/Gradient_descent) methods, which adjust all of the values in x {\displaystyle \mathbf {x} } at each iteration according to the gradient of the hill.) With hill climbing, any change that improves f ( x ) {\displaystyle f(\mathbf {x} )} is accepted, and the process continues until no change can be found to improve the value of f ( x ) {\displaystyle f(\mathbf {x} )} . Then x {\displaystyle \mathbf {x} } is said to be "locally optimal".

In discrete vector spaces, each possible value for x {\displaystyle \mathbf {x} } may be visualized as a [vertex](/source/Vertex_(graph_theory)) in a [graph](/source/Graph_(discrete_mathematics)). Hill climbing will follow the graph from vertex to vertex, always locally increasing (or decreasing) the value of f ( x ) {\displaystyle f(\mathbf {x} )} , until a [local maximum](/source/Local_maximum) (or [local minimum](/source/Local_minimum)) x m {\displaystyle x_{m}} is reached.

## Variants

In **simple hill climbing**, the first closer node is chosen, whereas in **steepest ascent hill climbing** all successors are compared and the closest to the solution is chosen. Both forms fail if there is no closer node, which may happen if there are local maxima in the search space which are not solutions. Steepest ascent hill climbing is similar to [best-first search](/source/Best-first_search), which tries all possible extensions of the current path instead of only one.[2]

**[Stochastic hill climbing](/source/Stochastic_hill_climbing)** does not examine all neighbors before deciding how to move. Rather, it selects a neighbor at random, and decides (based on the amount of improvement in that neighbor) whether to move to that neighbor or to examine another. **First choice hill climbing** implements Stochastic hill climbing by randomly generating neighbours until a better neighbour is generated, in which this neighbour is then chosen. This method performs well when states have many possible successors (e.g. thousands).[3]

[Coordinate descent](/source/Coordinate_descent) does a [line search](/source/Line_search) along one coordinate direction at the current point in each iteration. Some versions of coordinate descent randomly pick a different coordinate direction each iteration.[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed)*]

**Random-restart hill climbing** is a [meta-algorithm](/source/Meta-algorithm) built on top of the hill climbing algorithm. It is also known as **Shotgun hill climbing**. It iteratively does hill-climbing, each time with a random initial condition x 0 {\displaystyle x_{0}} . The best x m {\displaystyle x_{m}} is kept: if a new run of hill climbing produces a better x m {\displaystyle x_{m}} than the stored state, it replaces the stored state.[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed)*]

## Problems

### Local maxima

A surface with two local maxima. (Only one of them is the global maximum.) If a hill-climber begins in a poor location, it may converge to the lower maximum.

Hill climbing will not necessarily find the global maximum, but may instead converge on a [local maximum](/source/Maxima_and_minima). This problem does not occur if the heuristic is convex. However, as many functions are not convex hill climbing may often fail to reach a global maximum. Other local search algorithms try to overcome this problem such as [stochastic hill climbing](/source/Stochastic_hill_climbing), [random walks](/source/Random_walk) and [simulated annealing](/source/Simulated_annealing).

Despite the many local maxima in this graph, the global maximum can still be found using simulated annealing. Unfortunately, the applicability of simulated annealing is problem-specific because it relies on finding *lucky jumps* that improve the position. In such extreme examples, hill climbing will most probably produce a local maximum.

### Ridges and alleys

*A ridge*

[Ridges](/source/Ridge_(differential_geometry)) are a challenging problem for hill climbers that optimize in continuous spaces. Because hill climbers only adjust one element in the vector at a time, each step will move in an axis-aligned direction. If the target function creates a narrow ridge that ascends in a non-axis-aligned direction (or if the goal is to minimize, a narrow alley that descends in a non-axis-aligned direction), then the hill climber can only ascend the ridge (or descend the alley) by zig-zagging. If the sides of the ridge (or alley) are very steep, then the hill climber may be forced to take very tiny steps as it zig-zags toward a better position. Thus, it may take an unreasonable length of time for it to ascend the ridge (or descend the alley).

By contrast, gradient descent methods can move in any direction that the ridge or alley may ascend or descend. Hence, gradient descent or the [conjugate gradient method](/source/Conjugate_gradient_method) is generally preferred over hill climbing when the target function is differentiable. Hill climbers, however, have the advantage of not requiring the target function to be differentiable, so hill climbers may be preferred when the target function is complex.

### Plateau

Another problem that sometimes occurs with hill climbing is that of a plateau. A plateau is encountered when the search space is flat, or sufficiently flat that the value returned by the target function is indistinguishable from the value returned for nearby regions due to the precision used by the machine to represent its value. In such cases, the hill climber may not be able to determine in which direction it should step, and may wander in a direction that never leads to improvement.

Pseudocode
**algorithm** Discrete Space Hill Climbing **is**
    currentNode := startNode
    **loop do**
        L := NEIGHBORS(currentNode)
        nextEval := −INF
        nextNode := NULL
        **for** all x in L **do**
            **if** EVAL(x) > nextEval **then**
                nextNode := x
                nextEval := EVAL(x)
        **if** nextEval ≤ EVAL(currentNode) **then**
            // Return current node since no better neighbors exist
            **return** currentNode
        currentNode := nextNode

**algorithm** Continuous Space Hill Climbing **is**
    currentPoint := initialPoint    // the zero-magnitude vector is common
    stepSize := initialStepSizes    // a vector of all 1's is common
    acceleration := someAcceleration // a value such as 1.2 is common
    candidate[0] := −acceleration
    candidate[1] := −1 / acceleration
    candidate[2] := 1 / acceleration
    candidate[3] := acceleration
    bestScore := EVAL(currentPoint)
    **loop do**
        beforeScore := bestScore
        **for each** element i in currentPoint **do**
            beforePoint := currentPoint[i]
            bestStep := 0
            **for** j from 0 to 3 **do**      // try each of 4 candidate locations
                step := stepSize[i] × candidate[j]
                currentPoint[i] := beforePoint + step
                score := EVAL(currentPoint)
                **if** score > bestScore **then**
                    bestScore := score
                    bestStep := step
            **if** bestStep is 0 **then**
                currentPoint[i] := beforePoint
                stepSize[i] := stepSize[i] / acceleration
            **else**
                currentPoint[i] := beforePoint + bestStep
                stepSize[i] := bestStep // acceleration
        **if** (bestScore − beforeScore) < epsilon **then**
            **return** currentPoint

Contrast [genetic algorithm](/source/Genetic_algorithm); [random optimization](/source/Random_optimization).

## See also

- [Gradient descent](/source/Gradient_descent)

- [Greedy algorithm](/source/Greedy_algorithm)

- [Tâtonnement](/source/Walrasian_auction)

- [Mean-shift](/source/Mean-shift)

- [A* search algorithm](/source/A*_search_algorithm)

## References

- [Russell, Stuart J.](/source/Stuart_J._Russell); [Norvig, Peter](/source/Peter_Norvig) (2003), [*Artificial Intelligence: A Modern Approach*](http://aima.cs.berkeley.edu/) (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, pp. 111–114, [ISBN](/source/ISBN_(identifier)) [0-13-790395-2](https://en.wikipedia.org/wiki/Special:BookSources/0-13-790395-2)

1. **[^](#cite_ref-1)** [Skiena, Steven](/source/Steven_Skiena) (2010). *The Algorithm Design Manual* (2nd ed.). [Springer Science+Business Media](/source/Springer_Science%2BBusiness_Media). [ISBN](/source/ISBN_(identifier)) [978-1-849-96720-4](https://en.wikipedia.org/wiki/Special:BookSources/978-1-849-96720-4).

1. **[^](#cite_ref-2)** This article is based on material taken from [Hill+climbing](https://foldoc.org/Hill+climbing) at the *[Free On-line Dictionary of Computing](/source/Free_On-line_Dictionary_of_Computing)* prior to 1 November 2008 and incorporated under the "relicensing" terms of the [GFDL](/source/GNU_Free_Documentation_License), version 1.3 or later.

1. **[^](#cite_ref-3)** Russell, Stuart J.; Norvig, Peter (2022). *Artificial intelligence: a modern approach*. Prentice Hall series in artificial intelligence. Ming-wei Chang, Jacob Devlin, Anca Dragan, David Forsyth, Ian Goodfellow, Jitendra Malik, Vikash Mansinghka, Judea Pearl, Michael J. Wooldridge (4th, Global ed.). Boston: Pearson. p. 131. [ISBN](/source/ISBN_(identifier)) [978-1-292-40117-1](https://en.wikipedia.org/wiki/Special:BookSources/978-1-292-40117-1).

## Further reading

- Lasry, George (2018). [*A Methodology for the Cryptanalysis of Classical Ciphers with Search Metaheuristics*](https://www.uni-kassel.de/upress/online/OpenAccess/978-3-7376-0458-1.OpenAccess.pdf) (PDF). [Kassel University Press](/source/Kassel_University_Press). [ISBN](/source/ISBN_(identifier)) [978-3-7376-0459-8](https://en.wikipedia.org/wiki/Special:BookSources/978-3-7376-0459-8).

## External links

- [Hill climbing](https://en.wikibooks.org/wiki/Special:Search/Hill_climbing) at Wikibooks

v t e Optimization: Algorithms, methods, and heuristics Unconstrained nonlinear Functions Golden-section search Powell's method Line search Nelder–Mead method Successive parabolic interpolation Gradients Convergence Trust region Wolfe conditions Quasi–Newton Berndt–Hall–Hall–Hausman Broyden–Fletcher–Goldfarb–Shanno and L-BFGS Davidon–Fletcher–Powell Symmetric rank-one (SR1) Other methods Conjugate gradient Gauss–Newton Gradient Mirror Levenberg–Marquardt Powell's dog leg method Truncated Newton Hessians Newton's method Optimization computes maxima and minima. Constrained nonlinear General Barrier methods Penalty methods Differentiable Augmented Lagrangian methods Sequential quadratic programming Successive linear programming Convex optimization Convex minimization Cutting-plane method Reduced gradient (Frank–Wolfe) Subgradient method Linear and quadratic Interior point Affine scaling Ellipsoid algorithm of Khachiyan Projective algorithm of Karmarkar Basis-exchange Simplex algorithm of Dantzig Revised simplex algorithm Criss-cross algorithm Principal pivoting algorithm of Lemke Active-set method Combinatorial Paradigms Approximation algorithm Dynamic programming Greedy algorithm Integer programming Branch and bound/cut Graph algorithms Minimum spanning tree Borůvka Prim Kruskal Shortest path Bellman–Ford SPFA Dijkstra Floyd–Warshall Network flows Dinic Edmonds–Karp Ford–Fulkerson Push–relabel maximum flow Metaheuristics Evolutionary algorithm Hill climbing Local search Parallel metaheuristics Simulated annealing Spiral optimization algorithm Tabu search Software

---
Adapted from the Wikipedia article [Hill climbing](https://en.wikipedia.org/wiki/Hill_climbing) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Hill_climbing?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
