Bayesian Networks
Chapter 13 — Probabilistic Reasoning Book: Artificial Intelligence: A Modern Approach (Russell & Norvig, 4th ed) Pages: 451–496
What is a Bayesian Network?
A Bayesian network (BN) is a directed acyclic graph (DAG) where: - Nodes: random variables - Directed edges: X → Y means “X directly influences Y” (X is a parent of Y) - Each node stores a conditional probability table (CPT): P(Xᵢ | Parents(Xᵢ))
The network represents the joint distribution compactly via the chain rule:
P(X₁, ..., Xₙ) = Π P(Xᵢ | Parents(Xᵢ))
Example: Alarm Network
Burglary Earthquake
\ /
\ /
Alarm
/ \
JohnCalls MaryCalls
Variables: B=Burglary, E=Earthquake, A=Alarm, J=JohnCalls, M=MaryCalls
CPTs: - P(B) = 0.001 - P(E) = 0.002 - P(A | B, E): 4 entries (2×2 parent combinations) - P(J | A): 2 entries - P(M | A): 2 entries
Total: 2 + 2 + 4 + 2 + 2 = 10 numbers vs. 2^5-1 = 31 for full joint.
Conditional Independence in BNs
Three key patterns:
1. Chain: A → B → C
- A and C are d-separated given B: A ⊥ C | B
- Information flows A → B → C when B is unobserved; blocked when B is observed
2. Fork (Common cause): A ← B → C
- A and C are d-separated given B: A ⊥ C | B
- B explains the correlation between A and C
3. Collider (V-structure): A → B ← C
- A and C are d-separated when B is unobserved: A ⊥ C
- Explaining away: observing B makes A and C correlated! (B is a collider)
- Also active if any descendant of B is observed
D-Separation
d-separation (directed separation): a path between X and Y is blocked given evidence Z if: 1. Chain or Fork with middle node in Z: blocked 2. Collider with collider (and no descendants) in Z: blocked 3. Collider with collider OR descendant in Z: unblocked (explaining away)
If all paths between X and Y are blocked given Z, then X ⊥ Y | Z in the distribution.
Joint Distribution Factorization
For the Alarm network:
P(B,E,A,J,M) = P(B) · P(E) · P(A|B,E) · P(J|A) · P(M|A)
Any query can be computed from this factored form.
Query Types
| Query | Description | Example |
|---|---|---|
| Marginal | P(X) | P(Burglary) |
| Posterior | P(X | e) |
| MAP | argmax_x P(x | e) |
| MPE | argmax_x₁..xₙ P(x₁..xₙ | e) |
Compact Representations
Deterministic nodes
P(X | parents) = 1 for exactly one value — represented as a function.
Noisy-OR
Each cause Cᵢ independently can trigger X:
P(X=F | c₁,...,cₙ) = Π P(X=F | cᵢ) = Π (1 - qᵢ)
Where qᵢ = P(X=T | cᵢ alone).
Reduces CPT from 2^n entries to n entries.
Relevance to DynamICCL
BN for NCCL: - P(Congestion | NetworkLoad, NumGPUs, Algorithm) - P(Throughput | Congestion, BufferSize, ChunkSize) - Posterior inference: “given observed latency, what’s P(Algorithm=Ring is suboptimal)?”
Bayesian networks provide a principled uncertainty-aware model for NCCL performance — complementing the RL policy with probabilistic diagnostics.