Simulated Annealing
Motivation
Hill climbing never moves to worse states → gets stuck at local maxima. Simulated annealing escapes local maxima by occasionally accepting downhill moves, with acceptance probability decreasing over time.
Physical analogy: Annealing in metallurgy — heat metal (high energy = random), then cool slowly (energy decreases toward ground state). High temperature = explore; low temperature = exploit.
Algorithm
function SIMULATED-ANNEALING(problem, schedule) returns a state
current ← INITIAL-STATE(problem)
for t = 1, 2, ... do
T ← schedule(t) -- temperature at time t
if T = 0 then return current
next ← a randomly selected successor of current
ΔE ← VALUE(next) - VALUE(current)
if ΔE > 0 then
current ← next -- always accept improvement
else
current ← next with probability e^(ΔE/T)
Key Mechanics
Temperature Schedule
schedule(t) maps time step → temperature T: -
T starts high (lots of random moves accepted) -
T decreases (fewer bad moves accepted) -
T → 0: pure hill climbing
Common schedule: T(t) = T₀ / log(1 + t) or geometric
cooling T(t) = αᵗ · T₀ where α ≈ 0.95.
Acceptance Probability
For a downhill move with ΔE < 0:
P(accept) = e^(ΔE/T)
- Large |ΔE| (steep downhill) → lower acceptance probability
- High T → acceptance probability near 1 (nearly random walk)
- Low T → acceptance probability near 0 (nearly greedy)
At fixed T, this is exactly the Boltzmann distribution from statistical mechanics.
Convergence Guarantee
Theorem: If the temperature is decreased slowly
enough (specifically, if T decreases no faster than
T₀ / log(1 + t)), simulated annealing finds the global
optimum with probability approaching 1.
Practical caveat: “slowly enough” often means astronomically long schedules. In practice, use faster cooling and accept near-optimal solutions.
Properties
| Property | Value |
|---|---|
| Complete | Yes (with slow enough schedule) |
| Optimal | Yes (with slow enough schedule) |
| Practical | Use fast schedule → near-optimal quickly |
| Space | O(1) |
Connection to DPLL / WalkSAT (Ch.7)
WalkSAT’s mixed greedy+random strategy is essentially simulated annealing with a fixed (not decreasing) temperature — the random walk component prevents local optima, the greedy component provides direction.
Applications
- VLSI chip design
- Airline crew scheduling
- Protein folding
- Neural hyperparameter optimization
- DynamICCL: fine-tuning NCCL parameters where the objective (throughput) landscape is non-convex and hardware interactions create unexpected local optima