Probabilistic Programming

Chapter 15 — Probabilistic Programming Book: Artificial Intelligence: A Modern Approach (Russell & Norvig, 4th ed) Pages: 500–527


What is Probabilistic Programming?

A probabilistic programming language (PPL) embeds probability distributions as first-class objects in a programming language, allowing: - Arbitrary generative models expressed as programs - Automatic inference (the language handles the “how to compute posteriors” part) - Expressive power of Turing-complete programs + principled probabilistic reasoning

Core idea: a probabilistic program = a prior over execution traces. Conditioning on observations = posterior over traces.


The Generative Model View

Any probabilistic model can be written as:

def generative_model():
    # Sample latent variables (the "story" of how data was generated)
    theta = sample('theta', prior_distribution)
    # Generate observations
    obs = sample('obs', likelihood(theta))
    return obs

Inference: given observed obs=data, compute P(theta | obs=data).

The PPL automates this inference — the programmer only writes the generative model.


Example: Coin Flipping

def coin_model(data):
    # Prior: belief about coin bias
    p = sample('p', Beta(2, 2))
    # Likelihood: each flip is Bernoulli(p)
    for i, d in enumerate(data):
        observe(f'flip_{i}', Bernoulli(p), d)
    return p

# Inference: posterior P(p | data=[T,T,H,T,...])
posterior = infer(coin_model, data=[1,1,0,1,1,0])

Key PPL Constructs

Construct Purpose
sample(name, dist) Sample a random variable from distribution
observe(name, dist, value) Condition on observed value
infer(model, data) Run inference → return posterior distribution

Inference Algorithms in PPLs

PPLs provide multiple backends:

MCMC-based

Variational Inference (VI)

ELBO(φ) = E_Q[log P(x, θ)] - E_Q[log Q(θ|φ)]

Sequential Monte Carlo (SMC)


Major PPLs

Language Backend Notes
Stan HMC/NUTS Explicit BUGS-like syntax; excellent for statistics
PyMC3/PyMC NUTS + VI (Theano/JAX) Python-native; flexible
Pyro / NumPyro SVI + NUTS (PyTorch/JAX) Neural PPL; deep probabilistic models
Edward2 VI + MCMC (TF) TensorFlow-based
Church / Venture MH over traces Original LISP-based research PPLs
WebPPL MH + SMC JavaScript; browser-based

Open-Universe Probability Models

Classical BNs have fixed variables. PPLs support open-universe models where: - The number of objects is unknown (random) - New objects can be created during inference

Example: How many vehicles are in a surveillance video? Sample the count, then explain the observations.

Languages: BLOG (Bayesian Logic), IBAL.


Neural Probabilistic Programs

Modern deep learning meets PPLs: - Variational Autoencoders (VAEs): PPL with neural encoder (approximate posterior) + neural decoder (likelihood) - Normalizing Flows: learnable invertible transformations → exact posterior samples - Deep Probabilistic Models (Pyro): neural networks as components inside a PPL

This enables fitting complex distributions to high-dimensional data (images, text) while maintaining probabilistic semantics.


Relevance to DynamICCL

PPLs enable Bayesian experimental design for NCCL: - Model NCCL throughput as a probabilistic function of parameters - Use VI or HMC to infer posterior over parameters given measured throughput - Select next configuration using expected improvement (acquisition function)

This is Bayesian optimization — a PPL-enabled approach to hyperparameter tuning that is more sample-efficient than RL for small parameter spaces.