Games and Adversarial Search

Chapter 5 — Adversarial Search and Games Book: Artificial Intelligence: A Modern Approach (Russell & Norvig, 4th ed) Pages: 146–179

Why Games?

Games have driven AI research since Shannon (1950) and Turing (1950) because they are: - Well-defined: clear rules, win/loss/draw criteria - Challenging: exponential state spaces (chess: ~10^47 states) - Adversarial: another agent actively works against you - Benchmarkable: easily measurable progress

Key milestones: IBM Deep Blue (chess, 1997), Google AlphaGo (go, 2016), OpenAI Five (Dota 2, 2019).

Formal Game Definition

A two-player, deterministic, zero-sum game of perfect information is defined by:

Component	Meaning
S₀	Initial state
TO-MOVE(s)	Which player moves in state s
ACTIONS(s)	Legal moves in state s
RESULT(s, a)	Successor state after action a
IS-TERMINAL(s)	True if game is over
UTILITY(s, p)	Final numeric value for player p in terminal state s

For zero-sum games: UTILITY(s, MAX) + UTILITY(s, MIN) = constant.

Convention: MAX player tries to maximize utility; MIN player tries to minimize it.

Types of Games

	Deterministic	Stochastic
Perfect info	Chess, Go, Checkers	Backgammon, Monopoly
Imperfect info	Battleship	Poker, Bridge

Perfect information: both players see full state
Imperfect information: hidden state (cards, etc.)

Game Tree

The game tree unfolds all possible plays from the initial state. Properties:

Branching factor b: average legal moves per position
Maximum depth d: total moves in the game
Leaf nodes: terminal states with known utilities
Interior nodes: non-terminal states where one player moves

For chess: b ≈ 35, d ≈ 80 → 35^80 ≈ 10^123 nodes — impossible to search exhaustively.

The Core Challenge

Unlike single-agent search where a path to the goal is sufficient, in adversarial search: - MAX cannot control MIN’s moves - An optimal plan for MAX must handle all possible MIN responses - Solution is a strategy (policy): a mapping from states to actions

Chapter Roadmap

Minimax: exact optimal strategy for two-player zero-sum games
Alpha-Beta Pruning: prune irrelevant subtrees without losing optimality
Heuristic Evaluation: approximate deep search with eval functions
Monte Carlo Tree Search (MCTS): simulation-based search — no eval function
Stochastic Games: expectiminimax for chance nodes
Partially Observable Games: imperfect information

Connection to RL

Game playing = two-agent RL (multi-agent RL / MARL). AlphaGo / AlphaZero = MCTS + deep RL (policy + value networks trained by self-play).

For DynamICCL: adversarial framing applies to distributed training where some nodes may be slow/faulty — the “optimizer” adapts against an “adversarial” environment.