Exact Inference in Bayesian Networks

The Inference Problem

Given: - A Bayesian network - Evidence E = e (observed variables) - Query variable X

Compute: P(X | e)

Inference by Enumeration

Direct approach: sum over all hidden variables.

P(X | e) = α · P(X, e) = α · Σ_y P(X, e, y)

Where y ranges over all combinations of hidden variable values.

function ENUMERATION-ASK(X, e, bn) returns probability distribution over X
    Q(X) ← distribution over X
    for each value xᵢ of X do
        Q(xᵢ) ← ENUMERATE-ALL(bn.VARS, e ∪ {X=xᵢ})
    return NORMALIZE(Q(X))

function ENUMERATE-ALL(vars, e) returns a real number
    if EMPTY?(vars) then return 1.0
    Y ← FIRST(vars)
    if Y has value y in e then
        return P(y | Parents(Y)) · ENUMERATE-ALL(REST(vars), e)
    else
        return Σ_y P(y | Parents(Y)) · ENUMERATE-ALL(REST(vars), e ∪ {Y=y})

Complexity: O(n · d^n) — exponential; computes the same sub-expressions repeatedly.

Variable Elimination (VE)

Key insight: factor the sum. Don’t enumerate all assignments jointly — sum out one variable at a time.

P(Burglary | JohnCalls=T, MaryCalls=T)
= α · Σ_e Σ_a P(B) · P(e) · P(a|B,e) · P(J=T|a) · P(M=T|a)
= α · P(B) · Σ_e P(e) · Σ_a P(a|B,e) · P(J=T|a) · P(M=T|a)

Each summation creates a factor (table) that is used in subsequent operations.

Factor Operations

Factor product: (f₁ × f₂)(x,y,z) = f₁(x,y) × f₂(y,z)

Factor marginalization: Σ_Y f(X,Y) — sum out variable Y from factor

Variable Elimination Algorithm

function ELIMINATION-ASK(X, e, bn) returns distribution over X
    factors ← []
    for each var Y in REVERSE-TOPOLOGICAL-ORDER(bn) do
        factors.append(MAKE-FACTOR(Y, e))
        if Y is hidden variable then
            factors ← SUM-OUT(Y, factors)
    return NORMALIZE(POINTWISE-PRODUCT(factors))

Complexity: O(d^w) where w = treewidth of the network. - For polytree (singly-connected BN): O(n · d^k) where k = max parents - For networks with loops: can be exponential

Ordering Heuristic

The order in which variables are eliminated affects complexity. - Min-fill heuristic: eliminate the variable that adds the fewest new connections to remaining factors - Finding optimal order is NP-hard; good heuristics work well in practice

Complexity and Treewidth

Treewidth of a BN = width of the best junction tree = maximum clique size minus 1.

Structure	Treewidth	VE complexity
Polytree	1	O(n·d²)
Grid graph	O(√n)	Exponential
General	Up to n	Exponential

VE is tractable only when treewidth is small.

Belief Propagation (Message Passing)

For polytrees (no undirected cycles): - Messages passed between nodes from leaves inward and outward - Exact inference in O(n) passes

For general graphs: Loopy Belief Propagation (LBP): - Run BP on a graph with cycles — not guaranteed to converge - Often works well in practice (used in error-correcting codes, computer vision)

Junction Tree Algorithm

Exact inference for general networks by converting to a junction tree:

Moralize: add edges between all co-parents (marry parents)
Triangulate: add fill-in edges to make graph chordal
Build junction tree: cliques as nodes; running intersection property
Message passing on junction tree → exact inference

Complexity: O(d^(w+1)) where w = treewidth.

PMAP and IMAP

PMAP (perfect map): BN captures all and only the CI relations in P
IMAP (I-map): BN’s CI assumptions are a subset of P’s (may be too strong — missing edges)

Most practical BNs are IMAPs — they assert more independence than actually holds, but the approximation is useful.