Nondeterministic Actions and AND-OR Search Trees
Nondeterministic Environments
In classical search, each action leads to exactly one state.
In nondeterministic environments, an action can lead to any of several possible outcomes. The agent cannot control which outcome occurs.
Example: Erratic vacuum world — “Suck” may clean the current square OR clean it AND deposit dirt in the other square.
AND-OR Trees
Standard search = OR tree: at each state, the agent chooses which action to take → OR node.
Nondeterministic search = AND-OR tree: - OR nodes: agent’s choice — need to find one good action - AND nodes: environment’s choice — need a plan that handles all possible outcomes
State (agent chooses action) → OR node
Action (environment picks result) → AND node
A solution to an AND-OR tree is a subtree such that: 1. Every OR node has exactly one selected child (the chosen action) 2. Every AND node has all children in the subtree (all outcomes handled) 3. Every leaf is a goal state
This solution is a conditional plan (policy), not a simple sequence.
AND-OR-GRAPH-SEARCH Algorithm
function AND-OR-GRAPH-SEARCH(problem) returns a conditional plan or failure
return OR-SEARCH(problem, INITIAL-STATE(problem), [])
function OR-SEARCH(problem, state, path) returns a conditional plan or failure
if GOAL-TEST(state) then return the empty plan
if state is on path then return failure -- loop detection
for each action a in ACTIONS(state) do
plan ← AND-SEARCH(problem, RESULTS(state, a), [state | path])
if plan ≠ failure then return [a | plan]
return failure
function AND-SEARCH(problem, states, path) returns a conditional plan or failure
for each s_i in states do
plan_i ← OR-SEARCH(problem, s_i, path)
if plan_i = failure then return failure
return [if s_1 then plan_1 else if s_2 then plan_2 ... else plan_n]
The path argument prevents cycles (important: AND-OR
search can loop in cyclic problems).
Cyclic Plans
In some nondeterministic problems, a finite acyclic plan may not exist. The agent may need to retry an action until the environment cooperates.
Example: Slippery floor robot — “move forward” sometimes fails (stays in place). The plan is: “keep trying until you get there.”
A cyclic plan is represented as a policy (state → action mapping) rather than a tree. It can be thought of as a finite state machine:
if state = A then do action_1
if result = B then done
else retry (loop back to state A)
Cyclic plans are valid solutions if they guarantee eventual goal achievement (every execution eventually terminates at a goal).
Erratic Vacuum World Example
States: (location, dirt_left, dirt_right) ∈ {L,R} × {0,1} × {0,1}
“Suck” in dirty square: outcome 1 = cleans current AND leaves other; outcome 2 = cleans both.
The AND-OR plan:
[Suck, AND:
if [dirty_L=0, dirty_R=0]: done
if [dirty_L=0, dirty_R=1]: [Right, Suck]
]
This is a conditional plan that handles both nondeterministic outcomes of the initial Suck.
Properties
| Property | Value |
|---|---|
| Complete (acyclic) | Yes — finds plan if one exists |
| Handles nondeterminism | Yes — plans for all outcomes |
| Solution form | Conditional plan (policy) |
| Complexity | O(n·b^n) worst case where n = states |
Connection to RL and MDP
AND-OR search = policy search in a nondeterministic MDP without probabilities.
When outcomes have probabilities, AND nodes become expectation nodes → full MDP framework (Ch.17).
DynamICCL faces nondeterminism: network conditions during NCCL operations vary stochastically. The RL policy functions as a conditional plan that handles all observed network states.