Partial Observability and Belief States
Partially Observable Environments
The agent cannot directly observe the full state. It receives only percepts — partial, possibly noisy observations.
Two key cases: - Sensorless (conformant): agent has NO sensors — must act with zero observability - Contingency: agent has partial sensors — plan depends on what is observed
Belief States
In a partially observable environment, the agent maintains a belief state — a set (or distribution) over possible world states consistent with observations so far.
Belief state B ⊆ S = set of all states the agent considers possible
Fully observable: |B| = 1 (one known state) Sensorless: B = S initially (no information) Partial: |B| between 1 and |S|
Sensorless Problem: Belief-State Search
Even with no sensors, the agent can plan by searching in belief-state space.
State = a belief state (set of world states) Action = applied to all states in the belief set simultaneously Goal: reach a belief state where all members are goal states Successor: B’ = {s’ : s’ = RESULT(s, a), s ∈ B}
Key Property: Coercion
Actions can reduce the size of the belief state by forcing outcomes that constrain possibilities.
Example: Sensorless vacuum world - Initial belief state: all 8 states (location × dirt configurations) - Action “Suck”: removes dirt states → reduces belief state - Action “Right then Left”: may coerce agent to known location
A sensorless plan succeeds if it drives the belief state to a subset of goal states regardless of which world state was actually true.
Contingency Problem: Percept-Based Conditional Plans
When the agent HAS sensors, it can observe percepts and branch the plan:
[action_1,
if percept_A: [action_2a, ...],
if percept_B: [action_2b, ...]
]
At each AND node (percept branch), the plan must succeed for all possible percepts.
Incremental Belief State Update
When agent takes action a and receives percept o:
B' = UPDATE(PREDICT(B, a), o)
- PREDICT: apply action to belief state → predicted belief state
- UPDATE: filter to states consistent with observation o
This is the filtering operation. In probabilistic form (Ch.4.4, Ch.13), this becomes the Bayes filter.
Partially Observable Vacuum World Example
Agent knows its location (Left/Right) but NOT the dirt status of the other square.
Initial percept: (Left, Dirty) → agent knows it’s in Left with dirt, but doesn’t know right square.
Belief state: {[L,dirty_L,dirty_R=0], [L,dirty_L,dirty_R=1]}
After Suck: {[L,clean_L,dirty_R=0], [L,clean_L,dirty_R=1]} — reduced to 2 states
After “Right”: {[R,clean_L,dirty_R=0], [R,clean_L,dirty_R=1]}
After percept at R: if “Dirty” observed → belief = {[R,clean_L,dirty_R=1]} — single state known.
Complexity of Belief-State Search
The belief state space is exponentially larger than the world state space: - n world states → 2^n possible belief states - But most are unreachable from natural initial beliefs
In practice, belief states remain small if: - Percepts are informative (rapidly reduce uncertainty) - Actions have predictable effects
Connection to POMDP
Partial observability + stochastic outcomes + probabilities = Partially Observable MDP (POMDP).
POMDP policy: maps belief state (probability distribution over states) to actions.
DynamICCL: the RL agent observes local metrics (bandwidth, latency) but cannot observe global network state → naturally a POMDP setting. Belief-state search formalizes the intuition behind why the agent must maintain state history.