5. The Four Agent Architectures

Source: AIMA 4th Ed, Chapter 2 (Sections 2.4.1–2.4.5), physical PDF pp. 118–133

Introduction

Having established what agents are, what makes them rational, and what environments they face, Chapter 2 now answers the practical question: how do you actually build one?

Section 2.4 presents four basic agent architectures that embody the principles underlying almost all intelligent systems. They form a progression from simplest to most capable, each adding an additional component that allows the agent to handle more complex environments.

The key identity:

agent = architecture + program

The architecture is the physical computing substrate. The agent program implements the agent function. The goal: produce rational behavior from a compact program rather than from a vast lookup table.

2.4.1 The Table-Driven Agent (Why Not This)

Before the four useful architectures, the book first shows why the naive approach fails.

function TABLE-DRIVEN-AGENT(percept) returns an action
  persistent: percepts (sequence, initially empty)
               table (fully specified, indexed by percept sequences)

  append percept to end of percepts
  action <- LOOKUP(percepts, table)
  return action

Why it fails: Table size = sum_{t=1}^{T} |P|^t. For a taxi with 8 cameras at 70 MB/s over one hour: over 10^{600,000,000,000} entries. The number of atoms in the observable universe is < 10^{80}. No physical agent can store or populate this table.

The key challenge for AI: write programs that produce rational behavior from a small program, the way Newton’s 5-line algorithm for square roots replaced the huge printed tables engineers used before 1970.

2.4.2 Simple Reflex Agents

Core Idea

The simplest useful agent: select actions based only on the current percept, ignoring all history. Works only when the environment is fully observable — when the right action can be determined from the current percept alone.

Mechanism: Condition-Action Rules

The agent uses a set of condition-action rules (also called situation-action rules, productions, or if-then rules):

if car-in-front-is-braking then initiate-braking

The agent interprets the current percept into a state description, finds the first rule whose condition matches, and executes the corresponding action.

Pseudocode (Figure 2.10)

function SIMPLE-REFLEX-AGENT(percept) returns an action
  persistent: rules, a set of condition-action rules

  state  <- INTERPRET-INPUT(percept)
  rule   <- RULE-MATCH(state, rules)
  action <- rule.ACTION
  return action

INTERPRET-INPUT: translates the raw percept into an abstracted state description (e.g., “car in front is braking”)
RULE-MATCH: finds the first rule whose condition matches the current state

Agent Diagram (Figure 2.9, described in text)

  Sensors --> [What the world is like now] --> [What action I should do now] --> Actuators
                                                         ^
                                            (Condition-action rules)

Vacuum World Example (Figure 2.8)

function REFLEX-VACUUM-AGENT([location, status]) returns an action
  if status = Dirty  then return Suck
  else if location = A  then return Right
  else if location = B  then return Left

This tiny program replaces a table of 4^T entries by ignoring percept history entirely.

Strengths

Extremely simple to implement
Very fast (no planning, no memory)
Works well in fully observable, deterministic environments

Limitations

Works only if the correct decision can be made on the basis of just the current percept — i.e., only if the environment is fully observable
In partially observable environments, can get stuck in infinite loops

Example: Vacuum agent without a location sensor, only a dirt sensor. Percepts: [Dirty] or [Clean]. Moving Left fails forever if starting in A; moving Right fails forever if starting in B.

Partial fix: Randomization — when in [Clean], flip a coin to choose Left or Right. Expected time to reach the other square: 2 steps. Effective but not generally optimal. Randomization is a bandage; model-based agents handle this properly.

2.4.3 Model-Based Reflex Agents

Core Idea

To handle partial observability: maintain an internal state — a representation of the parts of the world the agent cannot currently see.

The agent’s internal state is updated over time using knowledge of: 1. Transition model: how the world changes on its own and as a result of the agent’s actions 2. Sensor model: how the current world state is reflected in the agent’s percepts

Together these allow the agent to maintain a “best guess” of the current world state even when it cannot be directly observed.

Two Required Models

Model	What it represents	Example
Transition model	How the world evolves: effects of agent actions + autonomous world changes	“Turning the steering wheel clockwise turns the car right; rain can wet the cameras”
Sensor model	How world states map to percepts	“When the car in front brakes, red regions appear in the forward camera”

An agent that uses such models is called a model-based agent.

Pseudocode (Figure 2.12)

function MODEL-BASED-REFLEX-AGENT(percept) returns an action
  persistent: state           -- current world state estimate
              transition_model -- how next state depends on current state + action
              sensor_model    -- how current world state maps to percepts
              rules           -- condition-action rules
              action          -- most recent action, initially none

  state  <- UPDATE-STATE(state, action, percept, transition_model, sensor_model)
  rule   <- RULE-MATCH(state, rules)
  action <- rule.ACTION
  return action

The key new function is UPDATE-STATE: it creates a new internal state estimate by combining the old state, the last action taken, the new percept, and the two models.

Agent Diagram (Figure 2.11, described in text)

  Sensors --> [What the world is like now] --> [What action I should do now] --> Actuators
                  ^                                        ^
       (How the world evolves)              (Condition-action rules)
       (What my actions do)
       (State)  <------- (dashed feedback from previous state)

Important Note on Certainty

Even with a model, it is seldom possible to determine the exact current state of a partially observable environment. The internal state box represents the agent’s best guess — sometimes multiple simultaneous guesses (a belief state). The agent must act under this uncertainty.

Example: An automated taxi cannot see around a large truck stopped in front of it — it can only guess what is causing the hold-up.

Strengths over Simple Reflex

Can handle partial observability
Can reason about effects of actions that aren’t immediately visible
More robust to sensor noise

Limitation

Still reactive: selects actions purely from condition-action rules applied to the state estimate
Has no notion of goals — it does not consider where it is trying to get

2.4.4 Goal-Based Agents

Core Idea

Knowing the current state is not enough if there are multiple possible actions, none of which is obviously correct without knowing where the agent is trying to go. Goal-based agents add goal information — a description of desirable states — and choose actions that achieve the goal.

What Goals Add

At a road junction, the taxi can turn left, right, or go straight. The correct decision depends on the destination. The goal-based agent combines: - Current state estimate (from the model) - Goals (desired states) - Predictions of future states (“What will the world be like if I do action A?”)

to choose actions that will eventually achieve the goal.

Agent Diagram (Figure 2.13, described in text)

  Sensors --> [What the world is like now]
                     |
                     v
             [What it will be like if I do action A]
                     |
                     v  <--- (Goals)
             [What action I should do now] --> Actuators

Goal-Based vs. Condition-Action Rules

A reflex agent brakes when it sees brake lights because a rule says so — it has no idea why braking is good. A goal-based agent brakes because braking is the action most likely to achieve its goal of not hitting other cars.

Key advantage of goal-based design: the knowledge supporting decisions is explicit and modifiable. Changing the goal changes the behavior. A reflex agent must have all its rules replaced to go to a new destination; a goal-based agent simply updates the goal.

Algorithms Used

Search (Chapters 3–5): finding action sequences that achieve the goal
Planning (Chapter 11): constructing plans for goal achievement in more complex domains

Limitation

Goals are binary: either achieved or not. There is no way to express that some ways of achieving the goal are better than others.
Cannot handle conflicting goals or goals achievable only with some probability

2.4.5 Utility-Based Agents

Core Idea

Goals are a crude binary measure (“happy” vs. “not happy”). A utility function maps states (or sequences of states) to a real number — a measure of how desirable the state is. This allows the agent to choose among multiple goal-achieving paths by picking the one with highest utility.

Utility Function

The utility function is essentially an internalization of the performance measure. If the internal utility function and the external performance measure are in agreement, an agent maximizing its utility will be rational with respect to the external performance measure.

When Utility Beats Goals

Conflicting goals: speed vs. safety in taxi driving. A utility function can encode the appropriate tradeoff; binary goals cannot.
Uncertain goal achievement: multiple action sequences all lead toward the goal, but some are more reliable. Utility provides a way to weigh likelihood of success against importance of the goal.

Expected Utility Maximization

Rational utility-based agents choose the action that maximizes expected utility:

a* = argmax_a  sum_s P(outcome = s | a, current_state) * U(s)

where: - P(outcome = s | a, current_state) = probability of reaching state s by taking action a - U(s) = utility of state s

This is the expected utility principle. Chapter 16 proves that any rational agent must behave as if it possesses a utility function whose expected value it tries to maximize — whether or not the utility function is explicit.

Agent Diagram (Figure 2.14, described in text)

  Sensors --> [What the world is like now]
                     |
                     v
             [What it will be like if I do action A]
                     |
                     v
             [How happy I will be in such a state]  <--- (Utility)
                     |
                     v
             [What action I should do now] --> Actuators

Model-Free Agents

Not all utility-based agents are model-based. A model-free agent can learn what action is best in a given situation without ever explicitly learning how actions change the environment. This is the province of model-free RL (Chapter 22, e.g., Q-learning, PPO).

Limitation

Perfect rationality via utility maximization is usually computationally intractable in practice. Real-world agents must settle for bounded rationality — making good decisions with limited computation.

Comparison of the Four Architectures

Architecture	Internal state	Models world	Uses goals	Uses utility	Handles partial obs
Simple reflex	No	No	No	No	No (loops)
Model-based reflex	Yes (belief state)	Yes	No	No	Yes
Goal-based	Yes	Yes	Yes	No	Yes
Utility-based	Yes	Yes	Yes (implicitly)	Yes	Yes

Each adds one capability over the previous. The progression is not always linear in practice — you can have utility-based agents that are model-free, or goal-based agents with no internal state (if the environment is fully observable).

Key Pseudocode Summary

Simple Reflex Agent

state  <- INTERPRET-INPUT(percept)
rule   <- RULE-MATCH(state, rules)
return rule.ACTION

Model-Based Reflex Agent

state  <- UPDATE-STATE(state, action, percept, transition_model, sensor_model)
rule   <- RULE-MATCH(state, rules)
return rule.ACTION

Goal-Based Agent (conceptual)

state  <- UPDATE-STATE(...)
options <- GENERATE-ACTIONS(state)
best   <- SEARCH(options, goal, model)
return best.next_action

Utility-Based Agent (conceptual)

state  <- UPDATE-STATE(...)
action <- argmax_a EU(a, state, utility_function, transition_model)
return action

Cross-References

Section 2.4.6 → Learning agents (how any of the above architectures can be augmented with learning)
Section 2.4.7 → Representation of agent components (atomic, factored, structured)
Chapters 3–5 → Search algorithms for goal-based agents
Chapter 11 → Planning for goal-based agents
Chapters 16–17 → Decision theory and MDPs for utility-based agents
Chapter 22 → RL: learning the utility function (value function) from experience
DynamICCL → RL-based NCCL tuner is a utility-based, partially model-free agent: it selects communication parameters (actions) to maximize throughput/latency reward (utility) in a partially observable, stochastic, sequential environment