Heuristic Evaluation Functions and Imperfect Real-Time Decisions

The Problem with Full Search

Even with alpha-beta, deep game trees are intractable. We must cut off search at a depth limit and evaluate the resulting non-terminal states.

function H-MINIMAX(game, state, cutoff-test, eval) returns action
    -- Same as minimax but stops at cutoff and uses eval instead of UTILITY
    -- Replace IS-TERMINAL(state) with cutoff-test(state, depth)
    -- Replace UTILITY(state) with eval(state, player)

Evaluation Functions

An evaluation function maps a non-terminal game state to a numeric estimate of its utility.

Requirements: 1. Terminal states: must agree with UTILITY (exact values) 2. Non-terminal states: must approximate true minimax value 3. Efficiency: must be computable quickly (called millions of times)

Linear Weighted Features (Classic Approach)

EVAL(s) = w₁·f₁(s) + w₂·f₂(s) + ... + wₙ·fₙ(s)

Where fᵢ(s) are features (numeric properties of the state) and wᵢ are weights.

Chess example (Shannon’s material evaluation):

EVAL(s) = 9·(♛MAX - ♛MIN) + 5·(♜MAX - ♜MIN) + 3·(♝MAX + ♞MAX - ♝MIN - ♞MIN) + 1·(♟MAX - ♟MIN) + ...

Piece values: Queen=9, Rook=5, Bishop=Knight=3, Pawn=1.

Other features: pawn structure (doubled/isolated pawns), king safety, bishop pairs, rook on open file, control of center squares.

Non-Linear Evaluation

Modern programs use neural networks for evaluation: - Input: board representation (piece positions, 12×8×8 feature planes) - Output: win probability [0,1] - Trained by: self-play (AlphaZero), tablebases, human games

The Horizon Effect

A major failure mode of fixed-depth cutoff search.

Problem: catastrophic events happen just beyond the search horizon.

Example: A pawn is about to queen on move d+1 (just outside depth d). The program doesn’t see it. It evaluates the position as good for MAX, plays, and loses the queen next move.

Mitigation: 1. Quiescence search: extend search until “quiet” position (no captures, no checks). Avoids cutting off in the middle of a tactical sequence. 2. Singular extension: if one move is clearly better than all others, extend that branch.

Quiescence Search

function QUIESCENCE(state, α, β)
    stand_pat ← eval(state)
    if stand_pat ≥ β then return β
    α ← max(α, stand_pat)
    for each capture/check move a do
        score ← -QUIESCENCE(RESULT(state, a), -β, -α)
        if score ≥ β then return β
        α ← max(α, score)
    return α

Cutoff Criteria

When to stop the search and apply the eval function?

Fixed depth: simple, predictable time
Time budget: search until clock runs out (iterative deepening)
Quiescent position: only stop when position is “stable”
Beam thresholding: only explore nodes within beam of best

Learning Evaluation Functions

Instead of hand-crafting features and weights, learn them:

Self-play: play many games, use outcomes (win/loss/draw) as training signal
Temporal difference learning: update weights based on prediction error (Samuel’s checkers, 1959 — one of the first ML programs)

EVAL(s) targets the value of EVAL(root of minimax tree from s)
Error = EVAL(s) - (one-step minimax value of s)
Update weights to reduce error

TD-Gammon (backgammon, 1992): learned to world-class level purely from self-play. AlphaZero: neural EVAL trained purely from self-play, no human features.

Properties Summary

Component	Role
Depth limit	Controls time budget
Eval function	Approximates utility at cutoff
Quiescence search	Extends past tactical sequences
Move ordering	Improves α-β pruning effectiveness
Transposition table	Avoids re-evaluating seen positions

Connection to RL

Heuristic evaluation = value function in RL. Learning eval functions by self-play = temporal difference learning (TD(λ)).

AlphaZero: policy + value network trained by TD from self-play → directly unifies game-playing and RL value estimation.

For DynamICCL: the RL value function approximating expected future NCCL throughput plays the same role as a heuristic eval — estimates long-term reward from current state.