Exact Inference in Bayesian Networks
The Inference Problem
Given: - A Bayesian network - Evidence E = e (observed variables) - Query variable X
Compute: P(X | e)
Inference by Enumeration
Direct approach: sum over all hidden variables.
P(X | e) = α · P(X, e) = α · Σ_y P(X, e, y)
Where y ranges over all combinations of hidden variable values.
function ENUMERATION-ASK(X, e, bn) returns probability distribution over X
Q(X) ← distribution over X
for each value xᵢ of X do
Q(xᵢ) ← ENUMERATE-ALL(bn.VARS, e ∪ {X=xᵢ})
return NORMALIZE(Q(X))
function ENUMERATE-ALL(vars, e) returns a real number
if EMPTY?(vars) then return 1.0
Y ← FIRST(vars)
if Y has value y in e then
return P(y | Parents(Y)) · ENUMERATE-ALL(REST(vars), e)
else
return Σ_y P(y | Parents(Y)) · ENUMERATE-ALL(REST(vars), e ∪ {Y=y})
Complexity: O(n · d^n) — exponential; computes the same sub-expressions repeatedly.
Variable Elimination (VE)
Key insight: factor the sum. Don’t enumerate all assignments jointly — sum out one variable at a time.
P(Burglary | JohnCalls=T, MaryCalls=T)
= α · Σ_e Σ_a P(B) · P(e) · P(a|B,e) · P(J=T|a) · P(M=T|a)
= α · P(B) · Σ_e P(e) · Σ_a P(a|B,e) · P(J=T|a) · P(M=T|a)
Each summation creates a factor (table) that is used in subsequent operations.
Factor Operations
Factor product: (f₁ × f₂)(x,y,z) = f₁(x,y) × f₂(y,z)
Factor marginalization: Σ_Y f(X,Y) — sum out variable Y from factor
Variable Elimination Algorithm
function ELIMINATION-ASK(X, e, bn) returns distribution over X
factors ← []
for each var Y in REVERSE-TOPOLOGICAL-ORDER(bn) do
factors.append(MAKE-FACTOR(Y, e))
if Y is hidden variable then
factors ← SUM-OUT(Y, factors)
return NORMALIZE(POINTWISE-PRODUCT(factors))
Complexity: O(d^w) where w = treewidth of the network. - For polytree (singly-connected BN): O(n · d^k) where k = max parents - For networks with loops: can be exponential
Ordering Heuristic
The order in which variables are eliminated affects complexity. - Min-fill heuristic: eliminate the variable that adds the fewest new connections to remaining factors - Finding optimal order is NP-hard; good heuristics work well in practice
Complexity and Treewidth
Treewidth of a BN = width of the best junction tree = maximum clique size minus 1.
| Structure | Treewidth | VE complexity |
|---|---|---|
| Polytree | 1 | O(n·d²) |
| Grid graph | O(√n) | Exponential |
| General | Up to n | Exponential |
VE is tractable only when treewidth is small.
Belief Propagation (Message Passing)
For polytrees (no undirected cycles): - Messages passed between nodes from leaves inward and outward - Exact inference in O(n) passes
For general graphs: Loopy Belief Propagation (LBP): - Run BP on a graph with cycles — not guaranteed to converge - Often works well in practice (used in error-correcting codes, computer vision)
Junction Tree Algorithm
Exact inference for general networks by converting to a junction tree:
- Moralize: add edges between all co-parents (marry parents)
- Triangulate: add fill-in edges to make graph chordal
- Build junction tree: cliques as nodes; running intersection property
- Message passing on junction tree → exact inference
Complexity: O(d^(w+1)) where w = treewidth.
PMAP and IMAP
- PMAP (perfect map): BN captures all and only the CI relations in P
- IMAP (I-map): BN’s CI assumptions are a subset of P’s (may be too strong — missing edges)
Most practical BNs are IMAPs — they assert more independence than actually holds, but the approximation is useful.