Games and Adversarial Search
Chapter 5 — Adversarial Search and Games Book: Artificial Intelligence: A Modern Approach (Russell & Norvig, 4th ed) Pages: 146–179
Why Games?
Games have driven AI research since Shannon (1950) and Turing (1950) because they are: - Well-defined: clear rules, win/loss/draw criteria - Challenging: exponential state spaces (chess: ~10^47 states) - Adversarial: another agent actively works against you - Benchmarkable: easily measurable progress
Key milestones: IBM Deep Blue (chess, 1997), Google AlphaGo (go, 2016), OpenAI Five (Dota 2, 2019).
Formal Game Definition
A two-player, deterministic, zero-sum game of perfect information is defined by:
| Component | Meaning |
|---|---|
| S₀ | Initial state |
| TO-MOVE(s) | Which player moves in state s |
| ACTIONS(s) | Legal moves in state s |
| RESULT(s, a) | Successor state after action a |
| IS-TERMINAL(s) | True if game is over |
| UTILITY(s, p) | Final numeric value for player p in terminal state s |
For zero-sum games: UTILITY(s, MAX) + UTILITY(s, MIN) = constant.
Convention: MAX player tries to maximize utility; MIN player tries to minimize it.
Types of Games
| Deterministic | Stochastic | |
|---|---|---|
| Perfect info | Chess, Go, Checkers | Backgammon, Monopoly |
| Imperfect info | Battleship | Poker, Bridge |
- Perfect information: both players see full state
- Imperfect information: hidden state (cards, etc.)
Game Tree
The game tree unfolds all possible plays from the initial state. Properties:
- Branching factor b: average legal moves per position
- Maximum depth d: total moves in the game
- Leaf nodes: terminal states with known utilities
- Interior nodes: non-terminal states where one player moves
For chess: b ≈ 35, d ≈ 80 → 35^80 ≈ 10^123 nodes — impossible to search exhaustively.
The Core Challenge
Unlike single-agent search where a path to the goal is sufficient, in adversarial search: - MAX cannot control MIN’s moves - An optimal plan for MAX must handle all possible MIN responses - Solution is a strategy (policy): a mapping from states to actions
Chapter Roadmap
- Minimax: exact optimal strategy for two-player zero-sum games
- Alpha-Beta Pruning: prune irrelevant subtrees without losing optimality
- Heuristic Evaluation: approximate deep search with eval functions
- Monte Carlo Tree Search (MCTS): simulation-based search — no eval function
- Stochastic Games: expectiminimax for chance nodes
- Partially Observable Games: imperfect information
Connection to RL
Game playing = two-agent RL (multi-agent RL / MARL). AlphaGo / AlphaZero = MCTS + deep RL (policy + value networks trained by self-play).
For DynamICCL: adversarial framing applies to distributed training where some nodes may be slow/faulty — the “optimizer” adapts against an “adversarial” environment.