Partially Observable Games and State of the Art

Partially Observable Games

In games with imperfect information, players cannot observe the full game state.

Examples: - Poker: hidden hole cards - Bridge: hidden hand of partner and opponents - Battleship: unknown ship positions - Kriegspiel: chess variant where you only see your own pieces


Belief States in Games

Like partial observability in search (Ch.4), we maintain a belief state — a probability distribution over possible world states consistent with the agent’s observation history.

In imperfect-information games: - Information set: the set of game states that are consistent with what the current player has observed - The player cannot distinguish between states in the same information set

Strategy: a mapping from information sets to actions (not from states to actions, as in perfect-information games).


Approaches to Imperfect Information

1. Determinization (Sampling)

Sample a “determinization” — a specific assignment to hidden variables — and run standard minimax/MCTS on that determinization. Repeat for many samples and average.

Problem: “strategy fusion” — the computed strategy may exploit information it doesn’t actually have.

Advantage: simple, widely used for card games (e.g., Bridge solvers).

2. Perfect Information Monte Carlo (PIMC)

  1. Sample N random deals (assignments of hidden cards)
  2. For each deal, run alpha-beta to find the best move
  3. Choose the move that was best in the most deals

Used by: some Bridge programs, Skat solvers.

3. Counterfactual Regret Minimization (CFR)

The dominant approach for poker and similar games. Minimizes regret (difference between actual payoff and what could have been achieved).

At each information set, track:
- Regret for not having played each action
- Strategy ∝ positive regrets (regret-matching)
- Average strategy over time → Nash equilibrium

Used by: Libratus, Pluribus (superhuman poker AI).

Key result: In zero-sum two-player games, iterative CFR converges to a Nash equilibrium — a strategy pair where neither player can benefit from unilateral deviation.

4. Full Extensive-Form Game Tree

Treat imperfect information formally using the extensive form with information sets. Solve for Nash equilibrium directly using linear programming (for small games).


Poker: The AI Benchmark

Poker (Texas Hold’em) is the main benchmark for imperfect-information game AI:

System Year Achievement
Claudico 2015 Lost to pro heads-up players
Libratus 2017 Beat pro heads-up players (2-player)
Pluribus 2019 Beat pro 6-player no-limit Hold’em

Pluribus key insights: - Uses CFR for self-play training - Real-time search during play (not just pre-computed strategy) - Handles 6-player (not just 2-player) → non-zero-sum multi-party game


State-of-the-Art Game Programs

Game System Key Technology
Chess Stockfish + Leela MCTS + neural eval / classical alpha-beta
Go AlphaGo/AlphaZero MCTS + policy+value networks (self-play)
Poker Pluribus CFR + real-time search
Atari DQN Deep Q-learning (model-free RL)
StarCraft II AlphaStar Multi-agent RL + imitation learning
Dota 2 OpenAI Five Proximal Policy Optimization (PPO)

Lessons for AI

  1. Self-play + RL = incredibly powerful for well-defined games (AlphaZero)
  2. MCTS + neural nets = handles large state spaces without eval function engineering
  3. CFR = principled solution to imperfect information
  4. Transfer: game-playing techniques (MCTS, TD learning, self-play) transfer to real-world sequential decision problems

Connection to RL and DynamICCL