Probabilistic Programming

Chapter 15 — Probabilistic Programming Book: Artificial Intelligence: A Modern Approach (Russell & Norvig, 4th ed) Pages: 500–527

What is Probabilistic Programming?

A probabilistic programming language (PPL) embeds probability distributions as first-class objects in a programming language, allowing: - Arbitrary generative models expressed as programs - Automatic inference (the language handles the “how to compute posteriors” part) - Expressive power of Turing-complete programs + principled probabilistic reasoning

Core idea: a probabilistic program = a prior over execution traces. Conditioning on observations = posterior over traces.

The Generative Model View

Any probabilistic model can be written as:

def generative_model():
    # Sample latent variables (the "story" of how data was generated)
    theta = sample('theta', prior_distribution)
    # Generate observations
    obs = sample('obs', likelihood(theta))
    return obs

Inference: given observed obs=data, compute P(theta | obs=data).

The PPL automates this inference — the programmer only writes the generative model.

Example: Coin Flipping

def coin_model(data):
    # Prior: belief about coin bias
    p = sample('p', Beta(2, 2))
    # Likelihood: each flip is Bernoulli(p)
    for i, d in enumerate(data):
        observe(f'flip_{i}', Bernoulli(p), d)
    return p

# Inference: posterior P(p | data=[T,T,H,T,...])
posterior = infer(coin_model, data=[1,1,0,1,1,0])

Key PPL Constructs

Construct	Purpose
`sample(name, dist)`	Sample a random variable from distribution
`observe(name, dist, value)`	Condition on observed value
`infer(model, data)`	Run inference → return posterior distribution

Inference Algorithms in PPLs

PPLs provide multiple backends:

MCMC-based

Metropolis-Hastings: propose → accept/reject based on probability ratio
HMC (Hamiltonian Monte Carlo): use gradient information for efficient proposals
NUTS (No U-Turn Sampler): adaptive HMC — used in Stan, PyMC3

Variational Inference (VI)

Approximate posterior with a family Q(θ|φ) parameterized by φ
Optimize φ to minimize KL(Q || P) ↔︎ maximize ELBO

ELBO(φ) = E_Q[log P(x, θ)] - E_Q[log Q(θ|φ)]

Mean-field VI: Q factors as Π Qᵢ(θᵢ) — coordinate ascent updates
ADVI: automatic differentiation variational inference — gradient-based

Sequential Monte Carlo (SMC)

Particle-based — good for sequential/temporal models
Integrates well with the forward-backward structure of Ch.14

Major PPLs

Language	Backend	Notes
Stan	HMC/NUTS	Explicit BUGS-like syntax; excellent for statistics
PyMC3/PyMC	NUTS + VI (Theano/JAX)	Python-native; flexible
Pyro / NumPyro	SVI + NUTS (PyTorch/JAX)	Neural PPL; deep probabilistic models
Edward2	VI + MCMC (TF)	TensorFlow-based
Church / Venture	MH over traces	Original LISP-based research PPLs
WebPPL	MH + SMC	JavaScript; browser-based

Open-Universe Probability Models

Classical BNs have fixed variables. PPLs support open-universe models where: - The number of objects is unknown (random) - New objects can be created during inference

Example: How many vehicles are in a surveillance video? Sample the count, then explain the observations.

Languages: BLOG (Bayesian Logic), IBAL.

Neural Probabilistic Programs

Modern deep learning meets PPLs: - Variational Autoencoders (VAEs): PPL with neural encoder (approximate posterior) + neural decoder (likelihood) - Normalizing Flows: learnable invertible transformations → exact posterior samples - Deep Probabilistic Models (Pyro): neural networks as components inside a PPL

This enables fitting complex distributions to high-dimensional data (images, text) while maintaining probabilistic semantics.

Relevance to DynamICCL

PPLs enable Bayesian experimental design for NCCL: - Model NCCL throughput as a probabilistic function of parameters - Use VI or HMC to infer posterior over parameters given measured throughput - Select next configuration using expected improvement (acquisition function)

This is Bayesian optimization — a PPL-enabled approach to hyperparameter tuning that is more sample-efficient than RL for small parameter spaces.