5. The Four Agent Architectures
Source: AIMA 4th Ed, Chapter 2 (Sections 2.4.1–2.4.5), physical PDF pp. 118–133
Introduction
Having established what agents are, what makes them rational, and what environments they face, Chapter 2 now answers the practical question: how do you actually build one?
Section 2.4 presents four basic agent architectures that embody the principles underlying almost all intelligent systems. They form a progression from simplest to most capable, each adding an additional component that allows the agent to handle more complex environments.
The key identity:
agent = architecture + program
The architecture is the physical computing substrate. The agent program implements the agent function. The goal: produce rational behavior from a compact program rather than from a vast lookup table.
2.4.1 The Table-Driven Agent (Why Not This)
Before the four useful architectures, the book first shows why the naive approach fails.
function TABLE-DRIVEN-AGENT(percept) returns an action
persistent: percepts (sequence, initially empty)
table (fully specified, indexed by percept sequences)
append percept to end of percepts
action <- LOOKUP(percepts, table)
return action
Why it fails: Table size =
sum_{t=1}^{T} |P|^t. For a taxi with 8 cameras at 70 MB/s
over one hour: over 10^{600,000,000,000} entries. The
number of atoms in the observable universe is < 10^{80}.
No physical agent can store or populate this table.
The key challenge for AI: write programs that produce rational behavior from a small program, the way Newton’s 5-line algorithm for square roots replaced the huge printed tables engineers used before 1970.
2.4.2 Simple Reflex Agents
Core Idea
The simplest useful agent: select actions based only on the current percept, ignoring all history. Works only when the environment is fully observable — when the right action can be determined from the current percept alone.
Mechanism: Condition-Action Rules
The agent uses a set of condition-action rules (also called situation-action rules, productions, or if-then rules):
if car-in-front-is-braking then initiate-braking
The agent interprets the current percept into a state description, finds the first rule whose condition matches, and executes the corresponding action.
Pseudocode (Figure 2.10)
function SIMPLE-REFLEX-AGENT(percept) returns an action
persistent: rules, a set of condition-action rules
state <- INTERPRET-INPUT(percept)
rule <- RULE-MATCH(state, rules)
action <- rule.ACTION
return action
INTERPRET-INPUT: translates the raw percept into an abstracted state description (e.g., “car in front is braking”)RULE-MATCH: finds the first rule whose condition matches the current state
Agent Diagram (Figure 2.9, described in text)
Sensors --> [What the world is like now] --> [What action I should do now] --> Actuators
^
(Condition-action rules)
Vacuum World Example (Figure 2.8)
function REFLEX-VACUUM-AGENT([location, status]) returns an action
if status = Dirty then return Suck
else if location = A then return Right
else if location = B then return Left
This tiny program replaces a table of 4^T entries by
ignoring percept history entirely.
Strengths
- Extremely simple to implement
- Very fast (no planning, no memory)
- Works well in fully observable, deterministic environments
Limitations
- Works only if the correct decision can be made on the basis of just the current percept — i.e., only if the environment is fully observable
- In partially observable environments, can get stuck in infinite loops
Example: Vacuum agent without a location sensor,
only a dirt sensor. Percepts: [Dirty] or
[Clean]. Moving Left fails forever if starting in A; moving
Right fails forever if starting in B.
Partial fix: Randomization — when
in [Clean], flip a coin to choose Left or Right. Expected
time to reach the other square: 2 steps. Effective but not generally
optimal. Randomization is a bandage; model-based agents handle this
properly.
2.4.3 Model-Based Reflex Agents
Core Idea
To handle partial observability: maintain an internal state — a representation of the parts of the world the agent cannot currently see.
The agent’s internal state is updated over time using knowledge of: 1. Transition model: how the world changes on its own and as a result of the agent’s actions 2. Sensor model: how the current world state is reflected in the agent’s percepts
Together these allow the agent to maintain a “best guess” of the current world state even when it cannot be directly observed.
Two Required Models
| Model | What it represents | Example |
|---|---|---|
| Transition model | How the world evolves: effects of agent actions + autonomous world changes | “Turning the steering wheel clockwise turns the car right; rain can wet the cameras” |
| Sensor model | How world states map to percepts | “When the car in front brakes, red regions appear in the forward camera” |
An agent that uses such models is called a model-based agent.
Pseudocode (Figure 2.12)
function MODEL-BASED-REFLEX-AGENT(percept) returns an action
persistent: state -- current world state estimate
transition_model -- how next state depends on current state + action
sensor_model -- how current world state maps to percepts
rules -- condition-action rules
action -- most recent action, initially none
state <- UPDATE-STATE(state, action, percept, transition_model, sensor_model)
rule <- RULE-MATCH(state, rules)
action <- rule.ACTION
return action
The key new function is UPDATE-STATE: it creates a new
internal state estimate by combining the old state, the last action
taken, the new percept, and the two models.
Agent Diagram (Figure 2.11, described in text)
Sensors --> [What the world is like now] --> [What action I should do now] --> Actuators
^ ^
(How the world evolves) (Condition-action rules)
(What my actions do)
(State) <------- (dashed feedback from previous state)
Important Note on Certainty
Even with a model, it is seldom possible to determine the exact current state of a partially observable environment. The internal state box represents the agent’s best guess — sometimes multiple simultaneous guesses (a belief state). The agent must act under this uncertainty.
Example: An automated taxi cannot see around a large truck stopped in front of it — it can only guess what is causing the hold-up.
Strengths over Simple Reflex
- Can handle partial observability
- Can reason about effects of actions that aren’t immediately visible
- More robust to sensor noise
Limitation
- Still reactive: selects actions purely from condition-action rules applied to the state estimate
- Has no notion of goals — it does not consider where it is trying to get
2.4.4 Goal-Based Agents
Core Idea
Knowing the current state is not enough if there are multiple possible actions, none of which is obviously correct without knowing where the agent is trying to go. Goal-based agents add goal information — a description of desirable states — and choose actions that achieve the goal.
What Goals Add
At a road junction, the taxi can turn left, right, or go straight. The correct decision depends on the destination. The goal-based agent combines: - Current state estimate (from the model) - Goals (desired states) - Predictions of future states (“What will the world be like if I do action A?”)
to choose actions that will eventually achieve the goal.
Agent Diagram (Figure 2.13, described in text)
Sensors --> [What the world is like now]
|
v
[What it will be like if I do action A]
|
v <--- (Goals)
[What action I should do now] --> Actuators
Goal-Based vs. Condition-Action Rules
A reflex agent brakes when it sees brake lights because a rule says so — it has no idea why braking is good. A goal-based agent brakes because braking is the action most likely to achieve its goal of not hitting other cars.
Key advantage of goal-based design: the knowledge supporting decisions is explicit and modifiable. Changing the goal changes the behavior. A reflex agent must have all its rules replaced to go to a new destination; a goal-based agent simply updates the goal.
Algorithms Used
- Search (Chapters 3–5): finding action sequences that achieve the goal
- Planning (Chapter 11): constructing plans for goal achievement in more complex domains
Limitation
- Goals are binary: either achieved or not. There is no way to express that some ways of achieving the goal are better than others.
- Cannot handle conflicting goals or goals achievable only with some probability
2.4.5 Utility-Based Agents
Core Idea
Goals are a crude binary measure (“happy” vs. “not happy”). A utility function maps states (or sequences of states) to a real number — a measure of how desirable the state is. This allows the agent to choose among multiple goal-achieving paths by picking the one with highest utility.
Utility Function
The utility function is essentially an internalization of the performance measure. If the internal utility function and the external performance measure are in agreement, an agent maximizing its utility will be rational with respect to the external performance measure.
When Utility Beats Goals
- Conflicting goals: speed vs. safety in taxi driving. A utility function can encode the appropriate tradeoff; binary goals cannot.
- Uncertain goal achievement: multiple action sequences all lead toward the goal, but some are more reliable. Utility provides a way to weigh likelihood of success against importance of the goal.
Expected Utility Maximization
Rational utility-based agents choose the action that maximizes expected utility:
a* = argmax_a sum_s P(outcome = s | a, current_state) * U(s)
where: - P(outcome = s | a, current_state) = probability
of reaching state s by taking action a -
U(s) = utility of state s
This is the expected utility principle. Chapter 16 proves that any rational agent must behave as if it possesses a utility function whose expected value it tries to maximize — whether or not the utility function is explicit.
Agent Diagram (Figure 2.14, described in text)
Sensors --> [What the world is like now]
|
v
[What it will be like if I do action A]
|
v
[How happy I will be in such a state] <--- (Utility)
|
v
[What action I should do now] --> Actuators
Model-Free Agents
Not all utility-based agents are model-based. A model-free agent can learn what action is best in a given situation without ever explicitly learning how actions change the environment. This is the province of model-free RL (Chapter 22, e.g., Q-learning, PPO).
Limitation
Perfect rationality via utility maximization is usually computationally intractable in practice. Real-world agents must settle for bounded rationality — making good decisions with limited computation.
Comparison of the Four Architectures
| Architecture | Internal state | Models world | Uses goals | Uses utility | Handles partial obs |
|---|---|---|---|---|---|
| Simple reflex | No | No | No | No | No (loops) |
| Model-based reflex | Yes (belief state) | Yes | No | No | Yes |
| Goal-based | Yes | Yes | Yes | No | Yes |
| Utility-based | Yes | Yes | Yes (implicitly) | Yes | Yes |
Each adds one capability over the previous. The progression is not always linear in practice — you can have utility-based agents that are model-free, or goal-based agents with no internal state (if the environment is fully observable).
Key Pseudocode Summary
Simple Reflex Agent
state <- INTERPRET-INPUT(percept)
rule <- RULE-MATCH(state, rules)
return rule.ACTION
Model-Based Reflex Agent
state <- UPDATE-STATE(state, action, percept, transition_model, sensor_model)
rule <- RULE-MATCH(state, rules)
return rule.ACTION
Goal-Based Agent (conceptual)
state <- UPDATE-STATE(...)
options <- GENERATE-ACTIONS(state)
best <- SEARCH(options, goal, model)
return best.next_action
Utility-Based Agent (conceptual)
state <- UPDATE-STATE(...)
action <- argmax_a EU(a, state, utility_function, transition_model)
return action
Cross-References
- Section 2.4.6 → Learning agents (how any of the above architectures can be augmented with learning)
- Section 2.4.7 → Representation of agent components (atomic, factored, structured)
- Chapters 3–5 → Search algorithms for goal-based agents
- Chapter 11 → Planning for goal-based agents
- Chapters 16–17 → Decision theory and MDPs for utility-based agents
- Chapter 22 → RL: learning the utility function (value function) from experience
- DynamICCL → RL-based NCCL tuner is a utility-based, partially model-free agent: it selects communication parameters (actions) to maximize throughput/latency reward (utility) in a partially observable, stochastic, sequential environment