Optimization with Simulated Annealing - Part 1

#### I’m dividing this post into 2 parts: ## Definition

“Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space” (Wikipedia 2017).

In other words, Simulated Annealing can be seen as a search strategy to help you find a good answer when finding the optimum one is impractical. It is very handy at finding answers to combinatorial problems (especially np-complete ones since we still don’t know a Turing-deterministic polynomial time algorithm to solve them) when you don’t have enough time or resource to waste by letting an exact algorithm (eg, a brute-force or Branch and Bound) finish.

## The Technique

Ok then, so what about this technique? What’s the idea behind it? Let’s look at Wikipedia again:

“The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. Both are attributes of the material that depend on its thermodynamic free energy. Heating and cooling the material affects both the temperature and the thermodynamic free energy […] This notion of slow cooling implemented in the Simulated Annealing algorithm is interpreted as a slow decrease in the probability of accepting worse solutions as the solution space is explored (accepting worse solutions is a fundamental property of metaheuristics because it allows for a more extensive search for the optimal solution). The simulation can be performed either by a solution of kinetic equations for density functions […] or by using the stochastic sampling method” (ref2, ref3 and ref4)

In other words, the idea behind this technique is somewhat similar to that of a heated metal. When it’s hot enough (within a high temperature) the, before rigid/non-malleable/uneven, metal becomes malleable again until the heat is dissipated and the temperature drops making it rigid again. The analogy this makes into solving combinatorial problems is that we can think of the problem as a metal within a high temperature: we find a feasible solution and try to improve it as the temperature drops. Initially we will be able to try most of the possibilities, even if doesn’t actually improves the current answer, because the metal is hot and we can easily mold it, but as the temperature drops we become less tolerant and only possibilities that do improve the answer can be tested until the metal is rigid again and we can’t test any possibility anymore, returning as solution the best answer seen so far.

Let’s divide this process into 5 main steps then:

1. Find a feasible solution
2. Use the current solution to create a new one
3. Check if we should use the new answer
1. If new solution is better than current one use the new one as current
2. Else depending how bad the new one is and how hot we are let’s check if we should keep it (kinetic equations or stochastic sampling)
4. Repeat steps 2 and 3 until temperature is cold enough
5. Return the best answer seen so far

Alright, now why should we use this technique? It doesn’t necessarily finds the optimum answer, it’s not necessarily fast (depends on the initial temperature, drop rate and perturbation strategy) and it’s hard to predict how good the final answer may be. Why should we use it in first place?

Well, start by noticing that this solution belongs to the metaheuristic solutions class so, like every other metaheuristic, it’s not problem dependant, we can apply it to any problem we see fit. When we don’t know an optimum strategy to solve a problem or it’s impractical to use an exact one, this technique may be a good option for try to find a good answer in an acceptable time. Actually, its popularity is quite big in many research fields.

Furthermore, because it’s problem independant, it doesn’t suffers from what many heuristics tend to suffer: being stuck at local optimum. It has higher chances to converge to the global optimum (same as optimum answer) because it will, initially, accept answers that didn’t improve the current answer and, by being worst than the previous, this new answer may get it out from the local optimum leading to another local, which this time is global, optimum. The image below, from Wikipedia, illustrates this optimization concept: Back to our initial question another advantage of this technique is that its speed and quality relate to the initial params, so one can balance speed over quality by incrementing or decrementing parameter values. This is ideal for scenarios where one can’t prioritize both, making this technique one of the few viable solutions in this case. Using the Stochastic Sampling perturbations one can also use randomness in its favor by executing multiple instances in parallel: both will finish at the same time returning distinct answers, just keep then the best one to increase the chances of finding the optimum one.

#### Summarizing:

• Problem independent, applicable to any problem (“many” would be more precise here, but you got the point)
• Finds a good approximate solution when optimal is impractical
• Avoids local optimum dilemma, higher chances to converge to the global optimum
• Allows balancing speed over quality
• Stochastic, parallel solution improves approximation to optimum answer

## Applications

Jump to Part 2 for a hands-on algorithm explaining how to implement a Simulated Annealing to solve the Traveling Salesman Problem.