4. Properties of Task Environments
Source: AIMA 4th Ed, Chapter 2 (Section 2.3.2), physical PDF pp. 111–118
Introduction
Different task environments impose fundamentally different challenges on agent design. Section 2.3.2 introduces a taxonomy of environment properties — six binary (or near-binary) dimensions along which any environment can be characterized. These dimensions determine which agent architectures and algorithms are appropriate.
This taxonomy is one of the most practically important frameworks in introductory AI — it is the bridge between problem description (PEAS) and algorithm selection.
The Six Dimensions
1. Fully Observable vs. Partially Observable
Fully observable: The agent’s sensors give it access to the complete state of the environment at each point in time — or at least all aspects of the state that are relevant to the choice of action (relevance depends on the performance measure).
Partially observable: The agent cannot see the complete relevant state, either because: - Sensors are noisy or inaccurate, or - Parts of the state are simply missing from sensor data
Example: a vacuum agent with only a local dirt sensor cannot tell whether other squares are dirty.
Unobservable: The agent has no sensors at all. Even then, goals may still be achievable — see Chapter 4.
Design implication: Fully observable environments allow the agent to make decisions based solely on the current percept, with no need to maintain internal memory about the world. Partially observable environments require the agent to maintain an internal belief state about unobserved parts of the world.
RL/DynamICCL connection: NCCL parameter tuning is partially observable — the agent cannot directly observe all the factors affecting collective communication performance (network contention, PCIe bottlenecks, OS scheduling jitter).
2. Single-Agent vs. Multiagent
Single-agent: The agent operates alone. Example: a crossword puzzle solver — the grid is not “trying” to obstruct the agent.
Multiagent: Multiple agents whose actions affect each other’s performance measures.
Key criterion for multiagent: Entity B should be modeled as an agent (not just a physical object) if B’s behavior is best described as maximizing a performance measure that depends on A’s behavior.
Competitive multiagent: One agent’s gain is another’s loss. Example: chess — the opponent is trying to minimize your performance measure. In competitive environments, randomized behavior can be rational because it avoids being predictable.
Cooperative multiagent: Avoiding collisions maximizes all agents’ performance measures. Example: multi-lane traffic — no single agent benefits from collisions.
Partially competitive: Many real environments are partially cooperative and partially competitive. Example: taxi driving — avoiding accidents is cooperative, but competing for a parking space is competitive.
Design implication: Multiagent problems require reasoning about the behavior of other agents, which introduces game theory, communication protocols, and emergent behavior not present in single-agent design.
3. Deterministic vs. Nondeterministic (and Stochastic)
Deterministic: The next state of the environment is completely determined by the current state and the action executed by the agent(s). No uncertainty about outcomes.
Nondeterministic: The next state is not fully determined — multiple outcomes are possible. The possibilities are listed without probabilities.
Stochastic: A refinement of nondeterministic — outcome probabilities are explicitly quantified (e.g., “there’s a 25% chance of rain tomorrow” vs. “there might be rain tomorrow”).
Distinction: - Nondeterministic: “the action might fail” (no probability assigned) - Stochastic: “the action fails with probability 0.1” (probability assigned)
Design implication: Deterministic environments require no uncertainty handling. Nondeterministic environments require reasoning about contingencies. Stochastic environments require probabilistic reasoning and expected-value optimization.
Real-world note: Even if an environment is technically deterministic, if it is partially observable it will appear nondeterministic to the agent (because unobserved state differences lead to apparently unpredictable outcomes). Taxi driving is effectively stochastic.
4. Episodic vs. Sequential
Episodic: The agent’s experience is divided into atomic episodes. In each episode, the agent receives a percept and performs a single action. Crucially, the current episode does not depend on actions taken in previous episodes, and the current decision does not affect future episodes.
Examples of episodic tasks: - Defective part detection on an assembly line: each part is judged independently - Image classification: each image is classified independently
Sequential: The current decision could affect all future decisions. Short-term actions have long-term consequences.
Examples of sequential tasks: - Chess: each move affects future legal moves and game outcome - Taxi driving: every steering decision affects subsequent safety and efficiency
Design implication: Episodic environments are simpler — the agent does not need to plan ahead or reason about future consequences. Sequential environments require planning, search, or value function estimation over time.
RL connection: Virtually all RL formulations are sequential. The MDP framework (Ch. 17) is the canonical model for sequential decision-making.
5. Static vs. Dynamic
Static: The environment does not change while the agent is deliberating (deciding what action to take). The agent can take as long as it needs to think without the world moving on.
Examples: crossword puzzles, static planning problems.
Dynamic: The environment changes while the agent deliberates. The world keeps moving; if the agent has not acted by a deadline, that inaction itself counts as a decision (doing nothing).
Semidynamic: The environment itself does not change over time, but the agent’s performance score does (typically due to time penalties). Example: chess with a clock — the board doesn’t change while you think, but your time runs out.
| Environment | Static/Dynamic |
|---|---|
| Crossword puzzle | Static |
| Chess with clock | Semidynamic |
| Taxi driving | Dynamic |
| Medical diagnosis (single visit) | Static (arguably) |
Design implication: Dynamic environments require real-time or anytime algorithms — the agent cannot afford unbounded deliberation time. Static environments allow deep offline search.
6. Discrete vs. Continuous
This distinction applies to three separate aspects of the environment:
- State: Is the environment state drawn from a finite set, or does it vary continuously?
- Time: Does the problem proceed in discrete time steps, or continuously?
- Percepts and actions: Are the percepts and actions drawn from finite sets, or are they continuous-valued?
| Environment | State | Time | Actions |
|---|---|---|---|
| Chess | Discrete (finite positions) | Discrete | Discrete (finite legal moves) |
| Taxi driving | Continuous (position, velocity) | Continuous | Continuous (steering angle, brake pressure) |
| Image analysis | Continuous | Discrete (frame rate) | Discrete (output category) |
Design implication: Discrete environments allow exact enumeration and classical search/planning methods. Continuous environments require approximation methods, function approximators, or discretization.
Bonus: Known vs. Unknown
This is listed separately because it is not strictly a property of the environment but of the agent’s knowledge about the environment:
Known: The agent (or designer) knows the “laws of physics” of the environment — the outcomes (or outcome probabilities) of all actions are given.
Unknown: The agent does not know how the environment works and must discover this through interaction.
Critical nuance: Known/unknown is orthogonal to fully/partially observable: - A known environment can still be partially observable (e.g., solitaire — you know the rules but can’t see the undealt cards) - An unknown environment can still be fully observable (e.g., a new video game — you can see the whole screen, but you don’t know what the buttons do)
Design implication: Unknown environments require the agent to learn or explore before it can act optimally. This is the essence of model-based RL.
Environment Classification Table (Figure 2.6)
| Task Environment | Observable | Agents | Deterministic | Episodic | Static | Discrete |
|---|---|---|---|---|---|---|
| Crossword puzzle | Fully | Single | Deterministic | Sequential | Static | Discrete |
| Chess with a clock | Fully | Multi | Deterministic | Sequential | Semi | Discrete |
| Poker | Partially | Multi | Stochastic | Sequential | Static | Discrete |
| Backgammon | Fully | Multi | Stochastic | Sequential | Static | Discrete |
| Taxi driving | Partially | Multi | Stochastic | Sequential | Dynamic | Continuous |
| Medical diagnosis | Partially | Single | Stochastic | Sequential | Dynamic | Continuous |
| Image analysis | Fully | Single | Deterministic | Episodic | Semi | Continuous |
| Part-picking robot | Partially | Single | Stochastic | Episodic | Dynamic | Continuous |
| Refinery controller | Partially | Single | Stochastic | Sequential | Dynamic | Continuous |
| English tutor | Partially | Multi | Stochastic | Sequential | Dynamic | Discrete |
The hardest case: partially observable, multiagent, nondeterministic, sequential, dynamic, continuous, and unknown. Taxi driving is hard in all these senses, except the driver’s environment is mostly known.
Important caveat: These classifications are not always cut and dried. Medical diagnosis could be episodic (diagnose given symptoms) or sequential (run tests over time, manage multiple patients). Context determines the appropriate classification.
Design Implications Summary
| Property | Simpler end | Harder end | What it demands |
|---|---|---|---|
| Observability | Fully observable | Partially observable | Internal belief state; state estimation |
| Agents | Single | Multi | Game theory; communication; adversarial reasoning |
| Determinism | Deterministic | Stochastic | Probabilistic reasoning; expected value optimization |
| Temporality | Episodic | Sequential | Planning; lookahead; value functions |
| Dynamics | Static | Dynamic | Real-time algorithms; anytime computation |
| State space | Discrete | Continuous | Function approximation; continuous optimization |
| Knowledge | Known | Unknown | Learning; exploration; model building |
Cross-References
- Section 2.4 → Agent architecture is chosen based on environment properties
- Chapter 3–5 → Search algorithms for deterministic, fully observable, single-agent, discrete
- Chapter 4 → Partially observable environments; belief states
- Chapter 17 → MDPs: sequential, stochastic, known environments
- Chapter 22 → RL: sequential, stochastic, unknown environments (model-free)
- DynamICCL → NCCL tuning is: partially observable, single-agent (or multi-process), stochastic, sequential, dynamic, continuous, partially unknown