4. Properties of Task Environments

Source: AIMA 4th Ed, Chapter 2 (Section 2.3.2), physical PDF pp. 111–118


Introduction

Different task environments impose fundamentally different challenges on agent design. Section 2.3.2 introduces a taxonomy of environment properties — six binary (or near-binary) dimensions along which any environment can be characterized. These dimensions determine which agent architectures and algorithms are appropriate.

This taxonomy is one of the most practically important frameworks in introductory AI — it is the bridge between problem description (PEAS) and algorithm selection.


The Six Dimensions

1. Fully Observable vs. Partially Observable

Fully observable: The agent’s sensors give it access to the complete state of the environment at each point in time — or at least all aspects of the state that are relevant to the choice of action (relevance depends on the performance measure).

Partially observable: The agent cannot see the complete relevant state, either because: - Sensors are noisy or inaccurate, or - Parts of the state are simply missing from sensor data

Example: a vacuum agent with only a local dirt sensor cannot tell whether other squares are dirty.

Unobservable: The agent has no sensors at all. Even then, goals may still be achievable — see Chapter 4.

Design implication: Fully observable environments allow the agent to make decisions based solely on the current percept, with no need to maintain internal memory about the world. Partially observable environments require the agent to maintain an internal belief state about unobserved parts of the world.

RL/DynamICCL connection: NCCL parameter tuning is partially observable — the agent cannot directly observe all the factors affecting collective communication performance (network contention, PCIe bottlenecks, OS scheduling jitter).


2. Single-Agent vs. Multiagent

Single-agent: The agent operates alone. Example: a crossword puzzle solver — the grid is not “trying” to obstruct the agent.

Multiagent: Multiple agents whose actions affect each other’s performance measures.

Key criterion for multiagent: Entity B should be modeled as an agent (not just a physical object) if B’s behavior is best described as maximizing a performance measure that depends on A’s behavior.

Competitive multiagent: One agent’s gain is another’s loss. Example: chess — the opponent is trying to minimize your performance measure. In competitive environments, randomized behavior can be rational because it avoids being predictable.

Cooperative multiagent: Avoiding collisions maximizes all agents’ performance measures. Example: multi-lane traffic — no single agent benefits from collisions.

Partially competitive: Many real environments are partially cooperative and partially competitive. Example: taxi driving — avoiding accidents is cooperative, but competing for a parking space is competitive.

Design implication: Multiagent problems require reasoning about the behavior of other agents, which introduces game theory, communication protocols, and emergent behavior not present in single-agent design.


3. Deterministic vs. Nondeterministic (and Stochastic)

Deterministic: The next state of the environment is completely determined by the current state and the action executed by the agent(s). No uncertainty about outcomes.

Nondeterministic: The next state is not fully determined — multiple outcomes are possible. The possibilities are listed without probabilities.

Stochastic: A refinement of nondeterministic — outcome probabilities are explicitly quantified (e.g., “there’s a 25% chance of rain tomorrow” vs. “there might be rain tomorrow”).

Distinction: - Nondeterministic: “the action might fail” (no probability assigned) - Stochastic: “the action fails with probability 0.1” (probability assigned)

Design implication: Deterministic environments require no uncertainty handling. Nondeterministic environments require reasoning about contingencies. Stochastic environments require probabilistic reasoning and expected-value optimization.

Real-world note: Even if an environment is technically deterministic, if it is partially observable it will appear nondeterministic to the agent (because unobserved state differences lead to apparently unpredictable outcomes). Taxi driving is effectively stochastic.


4. Episodic vs. Sequential

Episodic: The agent’s experience is divided into atomic episodes. In each episode, the agent receives a percept and performs a single action. Crucially, the current episode does not depend on actions taken in previous episodes, and the current decision does not affect future episodes.

Examples of episodic tasks: - Defective part detection on an assembly line: each part is judged independently - Image classification: each image is classified independently

Sequential: The current decision could affect all future decisions. Short-term actions have long-term consequences.

Examples of sequential tasks: - Chess: each move affects future legal moves and game outcome - Taxi driving: every steering decision affects subsequent safety and efficiency

Design implication: Episodic environments are simpler — the agent does not need to plan ahead or reason about future consequences. Sequential environments require planning, search, or value function estimation over time.

RL connection: Virtually all RL formulations are sequential. The MDP framework (Ch. 17) is the canonical model for sequential decision-making.


5. Static vs. Dynamic

Static: The environment does not change while the agent is deliberating (deciding what action to take). The agent can take as long as it needs to think without the world moving on.

Examples: crossword puzzles, static planning problems.

Dynamic: The environment changes while the agent deliberates. The world keeps moving; if the agent has not acted by a deadline, that inaction itself counts as a decision (doing nothing).

Semidynamic: The environment itself does not change over time, but the agent’s performance score does (typically due to time penalties). Example: chess with a clock — the board doesn’t change while you think, but your time runs out.

Environment Static/Dynamic
Crossword puzzle Static
Chess with clock Semidynamic
Taxi driving Dynamic
Medical diagnosis (single visit) Static (arguably)

Design implication: Dynamic environments require real-time or anytime algorithms — the agent cannot afford unbounded deliberation time. Static environments allow deep offline search.


6. Discrete vs. Continuous

This distinction applies to three separate aspects of the environment:

  1. State: Is the environment state drawn from a finite set, or does it vary continuously?
  2. Time: Does the problem proceed in discrete time steps, or continuously?
  3. Percepts and actions: Are the percepts and actions drawn from finite sets, or are they continuous-valued?
Environment State Time Actions
Chess Discrete (finite positions) Discrete Discrete (finite legal moves)
Taxi driving Continuous (position, velocity) Continuous Continuous (steering angle, brake pressure)
Image analysis Continuous Discrete (frame rate) Discrete (output category)

Design implication: Discrete environments allow exact enumeration and classical search/planning methods. Continuous environments require approximation methods, function approximators, or discretization.


Bonus: Known vs. Unknown

This is listed separately because it is not strictly a property of the environment but of the agent’s knowledge about the environment:

Known: The agent (or designer) knows the “laws of physics” of the environment — the outcomes (or outcome probabilities) of all actions are given.

Unknown: The agent does not know how the environment works and must discover this through interaction.

Critical nuance: Known/unknown is orthogonal to fully/partially observable: - A known environment can still be partially observable (e.g., solitaire — you know the rules but can’t see the undealt cards) - An unknown environment can still be fully observable (e.g., a new video game — you can see the whole screen, but you don’t know what the buttons do)

Design implication: Unknown environments require the agent to learn or explore before it can act optimally. This is the essence of model-based RL.


Environment Classification Table (Figure 2.6)

Task Environment Observable Agents Deterministic Episodic Static Discrete
Crossword puzzle Fully Single Deterministic Sequential Static Discrete
Chess with a clock Fully Multi Deterministic Sequential Semi Discrete
Poker Partially Multi Stochastic Sequential Static Discrete
Backgammon Fully Multi Stochastic Sequential Static Discrete
Taxi driving Partially Multi Stochastic Sequential Dynamic Continuous
Medical diagnosis Partially Single Stochastic Sequential Dynamic Continuous
Image analysis Fully Single Deterministic Episodic Semi Continuous
Part-picking robot Partially Single Stochastic Episodic Dynamic Continuous
Refinery controller Partially Single Stochastic Sequential Dynamic Continuous
English tutor Partially Multi Stochastic Sequential Dynamic Discrete

The hardest case: partially observable, multiagent, nondeterministic, sequential, dynamic, continuous, and unknown. Taxi driving is hard in all these senses, except the driver’s environment is mostly known.

Important caveat: These classifications are not always cut and dried. Medical diagnosis could be episodic (diagnose given symptoms) or sequential (run tests over time, manage multiple patients). Context determines the appropriate classification.


Design Implications Summary

Property Simpler end Harder end What it demands
Observability Fully observable Partially observable Internal belief state; state estimation
Agents Single Multi Game theory; communication; adversarial reasoning
Determinism Deterministic Stochastic Probabilistic reasoning; expected value optimization
Temporality Episodic Sequential Planning; lookahead; value functions
Dynamics Static Dynamic Real-time algorithms; anytime computation
State space Discrete Continuous Function approximation; continuous optimization
Knowledge Known Unknown Learning; exploration; model building

Cross-References