Games and Adversarial Search

Chapter 5 — Adversarial Search and Games Book: Artificial Intelligence: A Modern Approach (Russell & Norvig, 4th ed) Pages: 146–179


Why Games?

Games have driven AI research since Shannon (1950) and Turing (1950) because they are: - Well-defined: clear rules, win/loss/draw criteria - Challenging: exponential state spaces (chess: ~10^47 states) - Adversarial: another agent actively works against you - Benchmarkable: easily measurable progress

Key milestones: IBM Deep Blue (chess, 1997), Google AlphaGo (go, 2016), OpenAI Five (Dota 2, 2019).


Formal Game Definition

A two-player, deterministic, zero-sum game of perfect information is defined by:

Component Meaning
S₀ Initial state
TO-MOVE(s) Which player moves in state s
ACTIONS(s) Legal moves in state s
RESULT(s, a) Successor state after action a
IS-TERMINAL(s) True if game is over
UTILITY(s, p) Final numeric value for player p in terminal state s

For zero-sum games: UTILITY(s, MAX) + UTILITY(s, MIN) = constant.

Convention: MAX player tries to maximize utility; MIN player tries to minimize it.


Types of Games

Deterministic Stochastic
Perfect info Chess, Go, Checkers Backgammon, Monopoly
Imperfect info Battleship Poker, Bridge

Game Tree

The game tree unfolds all possible plays from the initial state. Properties:

For chess: b ≈ 35, d ≈ 80 → 35^80 ≈ 10^123 nodes — impossible to search exhaustively.


The Core Challenge

Unlike single-agent search where a path to the goal is sufficient, in adversarial search: - MAX cannot control MIN’s moves - An optimal plan for MAX must handle all possible MIN responses - Solution is a strategy (policy): a mapping from states to actions


Chapter Roadmap

  1. Minimax: exact optimal strategy for two-player zero-sum games
  2. Alpha-Beta Pruning: prune irrelevant subtrees without losing optimality
  3. Heuristic Evaluation: approximate deep search with eval functions
  4. Monte Carlo Tree Search (MCTS): simulation-based search — no eval function
  5. Stochastic Games: expectiminimax for chance nodes
  6. Partially Observable Games: imperfect information

Connection to RL

Game playing = two-agent RL (multi-agent RL / MARL). AlphaGo / AlphaZero = MCTS + deep RL (policy + value networks trained by self-play).

For DynamICCL: adversarial framing applies to distributed training where some nodes may be slow/faulty — the “optimizer” adapts against an “adversarial” environment.