Sampling MCMC
Perfect Sampling
Coupling from the past (Propp-Wilson): run Markov chains backward in time until all starting states coalesce. The result is an exact sample from the stationary distribution with no burn-in approximation, but often at a serious computational price.
Prerequisites
Why This Matters
Classical MCMC methods like Metropolis-Hastings and Gibbs sampling always leave a practical question: have you run the chain long enough? The samples are only approximately from the target distribution, and the approximation error from insufficient burn-in must be assessed indirectly through diagnostics such as , ESS, and trace behavior.
Perfect sampling solves this completely. The Propp-Wilson coupling from the past (CFTP) algorithm produces an exact sample from the stationary distribution. Not approximately correct. Not asymptotically correct. Exactly correct, in finite time, with probability 1.
Mental Model
Imagine starting a Markov chain from every possible initial state simultaneously, all driven by the same random numbers. At each step, some chains may land on the same state. Eventually, all chains coalesce to a single state. At that moment, the output is the same regardless of where you started, so it must be a draw from the stationary distribution.
The key insight: you cannot run chains forward from time , but you can work backward. Start at time , then , then , doubling backward until you find a time in the past where all chains have coalesced by time 0.
Formal Setup and Notation
Let be a Markov chain on a finite state space with transition kernel and stationary distribution . Let be a random function such that if , then (i.e., preserves ).
Coupling from the Past (CFTP)
The CFTP algorithm (Propp and Wilson, 1996):
- Fix a sequence of random maps
- For (doubling):
- Compute for every
- If is the same for all , output this common value and stop
- The output is an exact sample from
When Perfect Sampling Is Actually Worth It
Perfect sampling is conceptually clean, but it is not the default answer to every MCMC problem.
- If the state space is huge and there is no monotone structure, tracking all starting states is usually hopeless.
- If the chain admits a natural partial order, monotone CFTP can be practical and often gives the cleanest exact-sampling story for attractive lattice systems and some queueing models.
- If exactness matters more than wall-clock time, perfect sampling is one of the rare ways to eliminate burn-in bias by construction rather than by diagnosis.
This is why perfect sampling belongs beside, not above, ordinary MCMC. It is a different point in the tradeoff space: exactness in exchange for potentially expensive coupling work.
Main Theorems
CFTP Produces Exact Samples
Statement
If the Markov chain is ergodic on a finite state space and the random maps are i.i.d. and consistent with the transition kernel , then:
- The CFTP algorithm terminates almost surely in finite time
- The output has distribution exactly
Intuition
At the coalescence time, every initial state maps to the same output. Since the chain started from every state simultaneously, the output is invariant to the starting distribution. By stationarity of the random maps, this output must be a draw from . The construction must run backward: start from different times in the past, then run forward to time 0 reusing the same random numbers. Coupling forward into the future breaks the argument.
Proof Sketch
Let be the coalescence time (the smallest such that is constant). For any starting distribution , the output of CFTP is for any (they are all the same). If we start from : since is stationary, applying to a draw from gives a draw from . But the output does not depend on the starting state, so the output has distribution regardless of . Termination follows from ergodicity: the probability of non-coalescence after steps decays geometrically.
Why It Matters
This is one of the most surprising results in computational statistics. It shows that the burn-in problem of MCMC is not inherent: there exist algorithms that produce exact samples from the stationary distribution in finite time. The practical limitation is computational cost, not theoretical correctness.
Failure Mode
Running forward from a fixed time (coupling into the future) does not give exact samples. The coupling time depends on the starting state, introducing bias. CFTP avoids this by using the same random numbers for all starting states and checking coalescence at time 0.
Monotone CFTP
Statement
If the state space has a partial order with maximum element and minimum element , and the random maps preserve this order (), then coalescence of all chains can be checked by tracking only the top chain (starting from ) and bottom chain (starting from ).
If , then for all .
Intuition
If the chain is monotone, the top and bottom chains sandwich all other chains. When the top and bottom merge, everything in between has already merged. This reduces the computational cost from tracking chains to tracking just 2.
Proof Sketch
By monotonicity, implies . If the bounds are equal, all values must be equal.
Why It Matters
Without monotonicity, CFTP requires tracking one chain per state, which is infeasible for large state spaces. Monotone CFTP reduces this to two chains, making perfect sampling practical for models like the Ising model, certain Bayesian networks, and attractive point processes.
Failure Mode
Not all Markov chains admit a monotone coupling. When no natural partial order exists or the transition kernel is not monotone, you must track all states or find alternative coupling strategies, which may be impractical.
Perfect Sampling and MRFs
One reason perfect sampling keeps reappearing in statistics is that some interesting models have exactly the structure monotone CFTP needs. Attractive Ising-type Markov random fields are the canonical example: the partial order is coordinate-wise, the extremal configurations are the all-minus and all-plus states, and coalescence of those two bounding chains certifies coalescence of everything in between.
That is the ideal use case: the state space is still large, but the order structure compresses the coupling test from "track all states" to "track the extreme states." Without that monotone structure, perfect sampling often becomes more of a theoretical benchmark than a practical workflow.
Common Confusions
Forward coupling does not give exact samples
If you start all chains at time 0 and run them forward until they coalesce at some random time , the output is not a draw from . The coupling time depends on the starting states, creating a selection bias. CFTP works because it fixes time 0 as the output time and looks backward to find when coalescence occurred.
CFTP is not the same as running MCMC very long
Long MCMC runs produce approximate samples with unknown error. CFTP produces exact samples with a random (but finite) computation time. The distinction is fundamental: no diagnostic can certify that an MCMC sample is exact, but CFTP provides a certificate by construction.
You must reuse the same random numbers
When you extend the look-back window from to , you must keep the same random maps and add new ones . Regenerating all random numbers would invalidate the coupling and destroy exactness.
Exact does not mean cheap
Perfect sampling removes burn-in error, not computational cost. In difficult models the coupling time can be enormous, and tracking the needed random maps can dominate the total runtime. The output is exact; the bill can still be high.
Summary
- CFTP gives exact samples from the stationary distribution, not approximations
- Run chains backward: from different past times to the fixed present (time 0)
- Coalescence = all starting states produce the same output = exact sample
- Monotone CFTP reduces tracking from chains to 2 chains
- Forward coupling (coupling into the future) does not work
- The same random numbers must be reused when extending the look-back window
- Perfect sampling is most useful when monotone structure makes the coupling test cheap enough to be practical
Exercises
Problem
Explain why running a Markov chain forward from starting states until coalescence does not produce an exact sample from .
Problem
Consider a two-state Markov chain with states and transition matrix where and . Design a monotone random map consistent with , and identify a condition on the shared uniform random variable that forces the top and bottom chains to coalesce in one step.
References
Canonical:
- Propp & Wilson, "Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics" (1996)
- Fill, "An Interruptible Algorithm for Perfect Sampling via Markov Chains" (1998)
Current:
- Huber, Perfect Simulation (2016)
- Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 14
- Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
- Brooks et al., Handbook of MCMC (2011), Chapters 1-5
- Kendall, "Perfect simulation for the area-interaction point process" (1998), for monotone-coupling practice
- Foss and Tweedie, "Perfect simulation and backward coupling" (1998), Stochastic Models
Next Topics
Natural extensions from perfect sampling:
- Burn-in and convergence diagnostics: the approximate alternative when perfect sampling is not feasible
- MCMC for Markov random fields: the attractive-lattice setting where monotone CFTP is most intuitive
- Coupling methods: broader coupling techniques for bounding mixing times
Last reviewed: April 25, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
2- Gibbs Samplinglayer 2 · tier 1
- Metropolis-Hastings Algorithmlayer 2 · tier 1
Derived topics
2- Burn-in and Convergence Diagnosticslayer 2 · tier 2
- MCMC for Markov Random Fieldslayer 3 · tier 3
Graph-backed continuations