3. Search Framework and Data Structures

Source: AIMA 4th Ed., Chapter 3 (Section 3.3) Status: Complete


Overview

All search algorithms share a common framework: maintain a frontier of nodes to explore, expand the most promising node according to some evaluation function, and track which states have been reached to avoid redundant work. Understanding this framework makes every specific algorithm a special case.


The BEST-FIRST-SEARCH Framework

All algorithms in this chapter are instances of best-first search, differing only in the evaluation function f(n).

function BEST-FIRST-SEARCH(problem, f) returns solution node or failure
    node <- NODE(STATE=problem.INITIAL)
    frontier <- priority queue ordered by f, with node as element
    reached <- lookup table, key=problem.INITIAL, value=node
    while not IS-EMPTY(frontier) do
        node <- POP(frontier)
        if problem.IS-GOAL(node.STATE) then return node
        for each child in EXPAND(problem, node) do
            s <- child.STATE
            if s not in reached or child.PATH-COST < reached[s].PATH-COST then
                reached[s] <- child
                add child to frontier
    return failure

The EXPAND function generates all children:

function EXPAND(problem, node) yields nodes
    s <- node.STATE
    for each action in problem.ACTIONS(s) do
        s' <- problem.RESULT(s, action)
        cost <- node.PATH-COST + problem.ACTION-COST(s, action, s')
        yield NODE(STATE=s', PARENT=node, ACTION=action, PATH-COST=cost)

Node Data Structure

node.STATE       -- world state this node represents
node.PARENT      -- parent node (null for root)
node.ACTION      -- action taken from parent to reach this node
node.PATH-COST   -- g(n): cumulative cost from root to this node

Depth is not stored explicitly but can be computed: depth(node) = depth(node.PARENT) + 1.

The path from root to a node is reconstructed by following PARENT pointers backward:

function PATH-ACTIONS(node) returns list of actions
    if node.PARENT is null then return []
    return PATH-ACTIONS(node.PARENT) + [node.ACTION]

The Frontier

The frontier (also called “open list”) holds nodes that have been generated but not yet expanded. Its data structure determines which node is popped next:

Frontier Type Pop Order Used By
Priority queue (min-heap) Minimum f(n) Best-first, UCS, A*
FIFO queue Oldest first (FIFO) BFS
LIFO queue (stack) Most recently added DFS

The Reached Table

The reached table (also called “closed list” or “explored set”) maps each visited state to the best node found for it so far. Its purpose:

  1. Avoid re-expansion: if we reach a state we have already fully explored, skip it.
  2. Detect better paths: if we find a cheaper path to an already-reached state, update and re-add to frontier.
reached <- hash table: state -> node

The condition for adding to frontier:

if s not in reached OR child.PATH-COST < reached[s].PATH-COST:
    reached[s] <- child
    add child to frontier

Redundant Paths and Cycles

Redundant path: two different paths reaching the same state. The more expensive one can be discarded.

Cycle: a path that visits a node already on its own path. Cycles make tree-like search loop forever.

Three strategies for handling redundant paths:

  1. Remember all reached states (graph search): uses O(b^d) memory but prevents all redundant expansion. Optimal for most settings.

  2. Check only for cycles (partial): check if the new state appears anywhere on the current path (i.e., is an ancestor). Uses O(m) memory per path. Used by DFS.

  3. Ignore redundant paths (tree-like search): cheapest in memory but dangerous in graphs with cycles. Acceptable only when the state space is a tree (no cycles possible).


Property Graph Search Tree-Like Search
Tracks all reached states Yes No
Memory O(b^d) for frontier + reached O(b*m) for frontier only
Handles cycles Yes No (may loop infinitely)
May re-expand states No Yes
When to prefer State space has many redundant paths State space is a tree; memory is constrained

IS-CYCLE Check

An efficient cycle check for depth-first style search: walk up the ancestor chain.

function IS-CYCLE(node) returns boolean
    ancestor <- node.PARENT
    while ancestor is not null do
        if ancestor.STATE = node.STATE then return true
        ancestor <- ancestor.PARENT
    return false

Cost: O(depth) per check. For DFS, this is O(m) per node, which is acceptable given DFS’s O(m) depth limit.


Key Design Decisions in Any Search Algorithm

  1. What goes in f(n)? — This determines which node is expanded next.
  2. How is the frontier structured? — Priority queue, FIFO, LIFO.
  3. Do we track reached states? — Graph search vs. tree-like search.
  4. When do we check for goal? — At generation time (“early goal test”, used in BFS) or at expansion time (“late goal test”, used in UCS/A* for optimality guarantee).

The late goal test is critical for cost optimality: with early goal test, the first path to a goal state is returned immediately, but it may not be the cheapest path. Cost-optimal algorithms (UCS, A*) must check for goal only when the node is popped from the frontier (i.e., when it is guaranteed to be the cheapest path to that state).