Partial Observability and Belief States

Partially Observable Environments

The agent cannot directly observe the full state. It receives only percepts — partial, possibly noisy observations.

Two key cases: - Sensorless (conformant): agent has NO sensors — must act with zero observability - Contingency: agent has partial sensors — plan depends on what is observed


Belief States

In a partially observable environment, the agent maintains a belief state — a set (or distribution) over possible world states consistent with observations so far.

Belief state B ⊆ S = set of all states the agent considers possible

Fully observable: |B| = 1 (one known state) Sensorless: B = S initially (no information) Partial: |B| between 1 and |S|


Even with no sensors, the agent can plan by searching in belief-state space.

State = a belief state (set of world states) Action = applied to all states in the belief set simultaneously Goal: reach a belief state where all members are goal states Successor: B’ = {s’ : s’ = RESULT(s, a), s ∈ B}

Key Property: Coercion

Actions can reduce the size of the belief state by forcing outcomes that constrain possibilities.

Example: Sensorless vacuum world - Initial belief state: all 8 states (location × dirt configurations) - Action “Suck”: removes dirt states → reduces belief state - Action “Right then Left”: may coerce agent to known location

A sensorless plan succeeds if it drives the belief state to a subset of goal states regardless of which world state was actually true.


Contingency Problem: Percept-Based Conditional Plans

When the agent HAS sensors, it can observe percepts and branch the plan:

[action_1,
  if percept_A: [action_2a, ...],
  if percept_B: [action_2b, ...]
]

At each AND node (percept branch), the plan must succeed for all possible percepts.

Incremental Belief State Update

When agent takes action a and receives percept o:

B' = UPDATE(PREDICT(B, a), o)

This is the filtering operation. In probabilistic form (Ch.4.4, Ch.13), this becomes the Bayes filter.


Partially Observable Vacuum World Example

Agent knows its location (Left/Right) but NOT the dirt status of the other square.

Initial percept: (Left, Dirty) → agent knows it’s in Left with dirt, but doesn’t know right square.

Belief state: {[L,dirty_L,dirty_R=0], [L,dirty_L,dirty_R=1]}

After Suck: {[L,clean_L,dirty_R=0], [L,clean_L,dirty_R=1]} — reduced to 2 states

After “Right”: {[R,clean_L,dirty_R=0], [R,clean_L,dirty_R=1]}

After percept at R: if “Dirty” observed → belief = {[R,clean_L,dirty_R=1]} — single state known.


The belief state space is exponentially larger than the world state space: - n world states → 2^n possible belief states - But most are unreachable from natural initial beliefs

In practice, belief states remain small if: - Percepts are informative (rapidly reduce uncertainty) - Actions have predictable effects


Connection to POMDP

Partial observability + stochastic outcomes + probabilities = Partially Observable MDP (POMDP).

POMDP policy: maps belief state (probability distribution over states) to actions.

DynamICCL: the RL agent observes local metrics (bandwidth, latency) but cannot observe global network state → naturally a POMDP setting. Belief-state search formalizes the intuition behind why the agent must maintain state history.