Nondeterministic Actions and AND-OR Search Trees

Nondeterministic Environments

In classical search, each action leads to exactly one state.

In nondeterministic environments, an action can lead to any of several possible outcomes. The agent cannot control which outcome occurs.

Example: Erratic vacuum world — “Suck” may clean the current square OR clean it AND deposit dirt in the other square.

AND-OR Trees

Standard search = OR tree: at each state, the agent chooses which action to take → OR node.

Nondeterministic search = AND-OR tree: - OR nodes: agent’s choice — need to find one good action - AND nodes: environment’s choice — need a plan that handles all possible outcomes

State (agent chooses action)  → OR node
Action (environment picks result) → AND node

A solution to an AND-OR tree is a subtree such that: 1. Every OR node has exactly one selected child (the chosen action) 2. Every AND node has all children in the subtree (all outcomes handled) 3. Every leaf is a goal state

This solution is a conditional plan (policy), not a simple sequence.

AND-OR-GRAPH-SEARCH Algorithm

function AND-OR-GRAPH-SEARCH(problem) returns a conditional plan or failure
    return OR-SEARCH(problem, INITIAL-STATE(problem), [])

function OR-SEARCH(problem, state, path) returns a conditional plan or failure
    if GOAL-TEST(state) then return the empty plan
    if state is on path then return failure   -- loop detection
    for each action a in ACTIONS(state) do
        plan ← AND-SEARCH(problem, RESULTS(state, a), [state | path])
        if plan ≠ failure then return [a | plan]
    return failure

function AND-SEARCH(problem, states, path) returns a conditional plan or failure
    for each s_i in states do
        plan_i ← OR-SEARCH(problem, s_i, path)
        if plan_i = failure then return failure
    return [if s_1 then plan_1 else if s_2 then plan_2 ... else plan_n]

The path argument prevents cycles (important: AND-OR search can loop in cyclic problems).

Cyclic Plans

In some nondeterministic problems, a finite acyclic plan may not exist. The agent may need to retry an action until the environment cooperates.

Example: Slippery floor robot — “move forward” sometimes fails (stays in place). The plan is: “keep trying until you get there.”

A cyclic plan is represented as a policy (state → action mapping) rather than a tree. It can be thought of as a finite state machine:

if state = A then do action_1
if result = B then done
else retry (loop back to state A)

Cyclic plans are valid solutions if they guarantee eventual goal achievement (every execution eventually terminates at a goal).

Erratic Vacuum World Example

States: (location, dirt_left, dirt_right) ∈ {L,R} × {0,1} × {0,1}

“Suck” in dirty square: outcome 1 = cleans current AND leaves other; outcome 2 = cleans both.

The AND-OR plan:

[Suck, AND:
  if [dirty_L=0, dirty_R=0]: done
  if [dirty_L=0, dirty_R=1]: [Right, Suck]
]

This is a conditional plan that handles both nondeterministic outcomes of the initial Suck.

Properties

Property	Value
Complete (acyclic)	Yes — finds plan if one exists
Handles nondeterminism	Yes — plans for all outcomes
Solution form	Conditional plan (policy)
Complexity	O(n·b^n) worst case where n = states

Connection to RL and MDP

AND-OR search = policy search in a nondeterministic MDP without probabilities.

When outcomes have probabilities, AND nodes become expectation nodes → full MDP framework (Ch.17).

DynamICCL faces nondeterminism: network conditions during NCCL operations vary stochastically. The RL policy functions as a conditional plan that handles all observed network states.