Stochastic Games and Expectiminimax

Stochastic Games

Some games include random elements — dice rolls, shuffled cards, coin flips.

The game tree must include chance nodes in addition to MAX and MIN nodes: - MAX node: agent chooses action to maximize value - MIN node: opponent chooses action to minimize value - CHANCE node: nature randomly selects an outcome with known probabilities

Examples: - Backgammon: dice determine which moves are available - Monopoly: dice + community chest - Card games: random initial dealing


Expectiminimax

Extends minimax with expected value computation at chance nodes.

EXPECTIMINIMAX(s) =
    UTILITY(s)                                          if IS-TERMINAL(s)
    max over a in ACTIONS(s) of EXPECTIMINIMAX(RESULT(s,a))   if TO-MOVE(s) = MAX
    min over a in ACTIONS(s) of EXPECTIMINIMAX(RESULT(s,a))   if TO-MOVE(s) = MIN
    Σ over r of P(r) · EXPECTIMINIMAX(RESULT(s,r))           if TO-MOVE(s) = CHANCE

Where P(r) is the probability of chance outcome r.

The chance node value = expected value over all random outcomes.


Worked Backgammon Example

       MAX
        |
     CHANCE (dice)
    /         \
P=0.5         P=0.5
(roll 1)     (roll 2)
   |              |
  MIN            MIN
 /   \          /   \
2     4        4     6

MIN nodes: min(2,4)=2, min(4,6)=4 CHANCE: 0.5·2 + 0.5·4 = 3 MAX: max(3) = 3


Impact on Alpha-Beta Pruning

Alpha-beta pruning becomes less effective with chance nodes.

Problem: without knowing the exact die roll, we can’t prune — the expected value depends on all branches.

Partial solution: - Still prune if a value is clearly outside possible bounds - Requires knowing the range of leaf values: [v_min, v_max] - Pruning only when value already exceeds what chance could recover

Effectiveness: much weaker than deterministic alpha-beta.


Sensitivity to Evaluation Function

With chance nodes, the evaluation function must preserve ordinal relationships AND be linearly scaled correctly.

Critical insight: for deterministic games, any monotonic transformation of EVAL preserves optimal play. For stochastic games, this is NOT true — the expected value depends on the absolute scale.

Example: If EVAL values are {1,2,3} vs. {1,2,9}, the optimal move at a chance node may differ because expected values differ.

→ Evaluation functions in stochastic games must be calibrated probabilities or correctly scaled utilities, not just ordinal rankings.


Backgammon: TD-Gammon

TD-Gammon (Tesauro, 1992) was a landmark: - Neural network evaluation function for backgammon - Trained by temporal difference learning (TD) from self-play - No human expert knowledge — learned purely from games - Reached world-class level

Key result: demonstrated that RL (TD learning) can train superhuman evaluation functions from self-play alone — predating AlphaGo by 24 years.


Simultaneous Move Games

Some games (Prisoner’s Dilemma, rock-paper-scissors) have simultaneous moves — both players act at once without knowing the other’s choice.

These require game-theoretic analysis (Nash equilibria, mixed strategies) rather than minimax.

For rock-paper-scissors: the Nash equilibrium is to play each option with probability 1/3. Any deterministic strategy is exploitable.


Properties Summary

Game Type Algorithm Optimal?
Deterministic, perfect info Minimax + α-β Yes
Stochastic, perfect info Expectiminimax Yes (given exact P)
Imperfect info Special methods Approximate

Connection to RL / MDPs

Stochastic game = Markov Decision Process (MDP) (Ch.17) when one player controls all decisions.

Expectiminimax = Bellman equation for two-player stochastic games.

The expectation over chance outcomes in expectiminimax is the same operation as the expectation over next states in the Bellman optimality equation:

V*(s) = max_a Σ_s' P(s'|s,a) · [R(s,a,s') + γ·V*(s')]