Stochastic Games and Expectiminimax
Stochastic Games
Some games include random elements — dice rolls, shuffled cards, coin flips.
The game tree must include chance nodes in addition to MAX and MIN nodes: - MAX node: agent chooses action to maximize value - MIN node: opponent chooses action to minimize value - CHANCE node: nature randomly selects an outcome with known probabilities
Examples: - Backgammon: dice determine which moves are available - Monopoly: dice + community chest - Card games: random initial dealing
Expectiminimax
Extends minimax with expected value computation at chance nodes.
EXPECTIMINIMAX(s) =
UTILITY(s) if IS-TERMINAL(s)
max over a in ACTIONS(s) of EXPECTIMINIMAX(RESULT(s,a)) if TO-MOVE(s) = MAX
min over a in ACTIONS(s) of EXPECTIMINIMAX(RESULT(s,a)) if TO-MOVE(s) = MIN
Σ over r of P(r) · EXPECTIMINIMAX(RESULT(s,r)) if TO-MOVE(s) = CHANCE
Where P(r) is the probability of chance outcome r.
The chance node value = expected value over all random outcomes.
Worked Backgammon Example
MAX
|
CHANCE (dice)
/ \
P=0.5 P=0.5
(roll 1) (roll 2)
| |
MIN MIN
/ \ / \
2 4 4 6
MIN nodes: min(2,4)=2, min(4,6)=4 CHANCE: 0.5·2 + 0.5·4 = 3 MAX: max(3) = 3
Impact on Alpha-Beta Pruning
Alpha-beta pruning becomes less effective with chance nodes.
Problem: without knowing the exact die roll, we can’t prune — the expected value depends on all branches.
Partial solution: - Still prune if a value is clearly outside possible bounds - Requires knowing the range of leaf values: [v_min, v_max] - Pruning only when value already exceeds what chance could recover
Effectiveness: much weaker than deterministic alpha-beta.
Sensitivity to Evaluation Function
With chance nodes, the evaluation function must preserve ordinal relationships AND be linearly scaled correctly.
Critical insight: for deterministic games, any monotonic transformation of EVAL preserves optimal play. For stochastic games, this is NOT true — the expected value depends on the absolute scale.
Example: If EVAL values are {1,2,3} vs. {1,2,9}, the optimal move at a chance node may differ because expected values differ.
→ Evaluation functions in stochastic games must be calibrated probabilities or correctly scaled utilities, not just ordinal rankings.
Backgammon: TD-Gammon
TD-Gammon (Tesauro, 1992) was a landmark: - Neural network evaluation function for backgammon - Trained by temporal difference learning (TD) from self-play - No human expert knowledge — learned purely from games - Reached world-class level
Key result: demonstrated that RL (TD learning) can train superhuman evaluation functions from self-play alone — predating AlphaGo by 24 years.
Simultaneous Move Games
Some games (Prisoner’s Dilemma, rock-paper-scissors) have simultaneous moves — both players act at once without knowing the other’s choice.
These require game-theoretic analysis (Nash equilibria, mixed strategies) rather than minimax.
For rock-paper-scissors: the Nash equilibrium is to play each option with probability 1/3. Any deterministic strategy is exploitable.
Properties Summary
| Game Type | Algorithm | Optimal? |
|---|---|---|
| Deterministic, perfect info | Minimax + α-β | Yes |
| Stochastic, perfect info | Expectiminimax | Yes (given exact P) |
| Imperfect info | Special methods | Approximate |
Connection to RL / MDPs
Stochastic game = Markov Decision Process (MDP) (Ch.17) when one player controls all decisions.
Expectiminimax = Bellman equation for two-player stochastic games.
The expectation over chance outcomes in expectiminimax is the same operation as the expectation over next states in the Bellman optimality equation:
V*(s) = max_a Σ_s' P(s'|s,a) · [R(s,a,s') + γ·V*(s')]