Skip to main content

Foundations

Integration and Change of Variables

Riemann integration, improper integrals, the substitution rule, multivariate change of variables via the Jacobian determinant, and Fubini theorem. The computational backbone of probability and ML.

CoreTier 2StableSupporting~40 min

Why This Matters

Integration is the computational engine of probability and statistics. Every expectation, every marginal distribution, every normalizing constant, and every Bayesian posterior requires evaluating an integral. The change-of-variables formula is what allows you to transform distributions (e.g., from a Gaussian to any other distribution via a smooth map). Fubini's theorem is what lets you compute multivariate integrals by iterating single-variable integrals.

Riemann Integral Review

Definition

Riemann Integral

For a bounded function f:[a,b]Rf: [a, b] \to \mathbb{R}, the Riemann integral abf(x)dx\int_a^b f(x) \, dx is defined as the limit of Riemann sums:

abf(x)dx=limni=1nf(xi)Δxi\int_a^b f(x) \, dx = \lim_{n \to \infty} \sum_{i=1}^n f(x_i^*) \Delta x_i

where {x0,x1,,xn}\{x_0, x_1, \ldots, x_n\} is a partition of [a,b][a, b], xix_i^* is a sample point in [xi1,xi][x_{i-1}, x_i], and Δxi=xixi1\Delta x_i = x_i - x_{i-1}. The limit exists (and is independent of the choice of partitions and sample points) when ff is Riemann integrable. Every continuous function on [a,b][a, b] is Riemann integrable.

Improper Integrals

Definition

Improper Integral

When the domain is unbounded or the integrand is unbounded, define the integral as a limit:

af(x)dx=limRaRf(x)dx\int_a^\infty f(x) \, dx = \lim_{R \to \infty} \int_a^R f(x) \, dx

The integral converges if and only if this limit exists and is finite. Example: the Gaussian normalizing constant ex2/2dx=2π\int_{-\infty}^{\infty} e^{-x^2/2} dx = \sqrt{2\pi} is an improper integral that converges.

Improper integrals arise constantly in ML: the normalization of probability density functions, expectations over unbounded domains, and integrals involving heavy-tailed distributions.

Change of Variables (One Dimension)

Definition

Substitution Rule

If g:[a,b]Rg: [a, b] \to \mathbb{R} is C1C^1 and ff is continuous, then:

abf(g(x))g(x)dx=g(a)g(b)f(u)du\int_a^b f(g(x)) g'(x) \, dx = \int_{g(a)}^{g(b)} f(u) \, du

This is the substitution u=g(x)u = g(x), du=g(x)dxdu = g'(x) \, dx.

Multivariate Change of Variables

Theorem

Change of Variables Formula

Statement

Let ϕ:UV\phi: U \to V be a C1C^1 diffeomorphism between open subsets U,VRnU, V \subseteq \mathbb{R}^n. For any integrable function f:VRf: V \to \mathbb{R}:

Vf(y)dy=Uf(ϕ(x))detDϕ(x)dx\int_V f(y) \, dy = \int_U f(\phi(x)) \, |\det D\phi(x)| \, dx

where Dϕ(x)D\phi(x) is the n×nn \times n Jacobian matrix of ϕ\phi at xx, and detDϕ(x)|\det D\phi(x)| is the absolute value of its determinant.

Intuition

The Jacobian determinant measures how ϕ\phi stretches or compresses volume. A small cube of volume dVdV at xx maps to a region of approximate volume detDϕ(x)dV|\det D\phi(x)| \, dV at ϕ(x)\phi(x). The formula says: to integrate over the image, integrate over the preimage and multiply by this volume scaling factor.

Proof Sketch

For a linear map ϕ(x)=Ax\phi(x) = Ax, the result follows from the definition of the determinant as the volume scaling factor. For nonlinear ϕ\phi, approximate it locally by its linearization Dϕ(x)D\phi(x) on small cubes, apply the linear result, and sum. The rigorous proof uses the Lebesgue measure and approximation by simple functions.

Why It Matters

This formula is used everywhere in ML and statistics:

  1. Probability: if XX has density pXp_X and Y=ϕ(X)Y = \phi(X), then pY(y)=pX(ϕ1(y))detDϕ1(y)p_Y(y) = p_X(\phi^{-1}(y)) \cdot |\det D\phi^{-1}(y)|
  2. Normalizing flows: the log-likelihood involves logdetDϕ(x)\log |\det D\phi(x)|, and flow architectures are designed to make this determinant cheap to compute
  3. Bayesian inference: computing posteriors requires integrating over parameter spaces, often after a change of variables

Failure Mode

The formula as stated requires ϕ\phi to be a C1C^1 diffeomorphism (smooth with smooth inverse). Two natural relaxations come up in practice and require care:

  • Non-injective ϕ\phi. Partition the domain into regions on which ϕ\phi is injective and sum the contributions, weighting each piece by the multiplicity of ϕ1\phi^{-1} over the image (the area / coarea formula of geometric measure theory; Federer 1969).
  • Critical points where detDϕ=0\det D\phi = 0. If the critical set has Lebesgue measure zero and ϕ\phi remains locally injective off that set with controlled multiplicity, then those points contribute nothing and the standard formula holds on the regular part (Sard's theorem ensures the image of the critical set is null when ϕ\phi is sufficiently smooth). If, however, ϕ\phi collapses a set of positive measure (e.g. projects to a lower-dimensional submanifold) or is not injective in a controlled way, you cannot simply ignore the singular region — you must use the coarea formula or push forward the Lebesgue measure explicitly.

In short: "the singular set has measure zero, so it contributes nothing" is true for a C1C^1 diffeomorphism with isolated critical points; it is not a free pass for arbitrary singular or non-injective maps.

Example

Polar coordinates

The transformation ϕ(r,θ)=(rcosθ,rsinθ)\phi(r, \theta) = (r\cos\theta, r\sin\theta) maps (0,)×[0,2π)(0, \infty) \times [0, 2\pi) to R2{0}\mathbb{R}^2 \setminus \{0\}. The Jacobian:

Dϕ=(cosθrsinθsinθrcosθ),detDϕ=rD\phi = \begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}, \quad |\det D\phi| = r

So R2f(x,y)dxdy=02π0f(rcosθ,rsinθ)rdrdθ\int_{\mathbb{R}^2} f(x, y) \, dx \, dy = \int_0^{2\pi} \int_0^\infty f(r\cos\theta, r\sin\theta) \, r \, dr \, d\theta.

This is how you compute the Gaussian integral: R2e(x2+y2)/2dxdy=02π0er2/2rdrdθ=2π\int_{\mathbb{R}^2} e^{-(x^2+y^2)/2} dx \, dy = \int_0^{2\pi} \int_0^\infty e^{-r^2/2} r \, dr \, d\theta = 2\pi.

Fubini's Theorem

Theorem

Fubini's Theorem

Statement

If f:X×YRf: X \times Y \to \mathbb{R} is integrable (i.e., X×Yf(x,y)d(x,y)<\int_{X \times Y} |f(x, y)| \, d(x, y) < \infty), then:

X×Yf(x,y)d(x,y)=X(Yf(x,y)dy)dx=Y(Xf(x,y)dx)dy\int_{X \times Y} f(x, y) \, d(x, y) = \int_X \left(\int_Y f(x, y) \, dy\right) dx = \int_Y \left(\int_X f(x, y) \, dx\right) dy

The order of integration can be swapped.

Intuition

If the total integral is finite, you can compute a double integral by integrating one variable at a time, in either order. This is what makes multivariate integration tractable: you reduce it to a sequence of one-dimensional integrals.

Proof Sketch

The proof uses the monotone convergence theorem and the construction of product measures. For non-negative functions, the result follows from Tonelli's theorem (which does not require integrability, only measurability and non-negativity). Fubini extends this to signed functions by decomposing into positive and negative parts.

Why It Matters

Fubini's theorem is the justification for: (1) computing marginal distributions by integrating out variables, (2) switching the order of expectation and summation, (3) computing normalizing constants by iterated integration, and (4) the tower property of conditional expectation.

Failure Mode

The integrability condition is necessary. If f(x,y)d(x,y)=\int |f(x,y)| \, d(x,y) = \infty, the iterated integrals may exist but give different values depending on the order of integration. The classic counterexample uses f(x,y)=(x2y2)/(x2+y2)2f(x,y) = (x^2 - y^2)/(x^2 + y^2)^2 on [0,1]2[0,1]^2.

Applications in ML

Marginalizing Distributions

Given a joint density p(x,y)p(x, y), the marginal density of xx is:

p(x)=p(x,y)dyp(x) = \int p(x, y) \, dy

This uses Fubini to reduce a multivariate integral to a single-variable one.

Computing Normalizing Constants

A density p(x)=1Zp~(x)p(x) = \frac{1}{Z} \tilde{p}(x) where Z=p~(x)dxZ = \int \tilde{p}(x) \, dx. In Bayesian inference, computing ZZ (the evidence) often requires a change of variables to make the integral tractable.

Normalizing Flows

A normalizing flow transforms a simple base distribution pZ(z)p_Z(z) through a diffeomorphism ff to get pX(x)=pZ(f1(x))detDf1(x)p_X(x) = p_Z(f^{-1}(x)) \cdot |\det Df^{-1}(x)|. The change-of-variables formula makes this exact.

Common Confusions

Watch Out

The Jacobian determinant is the absolute value

In the change-of-variables formula for integrals, you use detDϕ|\det D\phi|, not detDϕ\det D\phi. The absolute value ensures the integral is non-negative regardless of whether the transformation preserves or reverses orientation. For probability density transformations, forgetting the absolute value gives wrong densities.

Watch Out

Fubini requires integrability, Tonelli does not

Tonelli's theorem (for non-negative functions) allows you to swap integration order without checking integrability first. This is useful because you can establish integrability by computing the iterated integral. Fubini applies to signed functions but requires you to verify integrability of f|f| first.

Summary

  • Substitution: f(g(x))g(x)dx=f(u)du\int f(g(x))g'(x) \, dx = \int f(u) \, du with u=g(x)u = g(x)
  • Multivariate change of variables: multiply by detDϕ(x)|\det D\phi(x)| when transforming coordinates
  • The Jacobian determinant measures local volume change
  • Fubini: swap integration order when f<\int |f| < \infty
  • These tools compute expectations, marginals, normalizing constants, and flow densities

Exercises

ExerciseCore

Problem

Compute 0xex2/2dx\int_0^\infty x e^{-x^2/2} \, dx using the substitution u=x2/2u = x^2/2.

ExerciseAdvanced

Problem

Let XN(0,1)X \sim \mathcal{N}(0, 1) and Y=eXY = e^X (so YY is log-normal). Use the change-of-variables formula to derive the density of YY.

References

Canonical:

  • Rudin, Principles of Mathematical Analysis (1976), Chapters 6 and 10
  • Folland, Real Analysis (1999), Chapter 2 (Lebesgue integration and product measures)
  • Apostol, Mathematical Analysis (1974), Chapters 10-11 (Riemann integration and multivariable change of variables)

Current:

  • Kobyzev et al., "Normalizing Flows: An Introduction and Review of Current Methods" (2021). Change-of-variables in deep learning.
  • Billingsley, Probability and Measure (1995), Chapter 3 (integration and Fubini's theorem in measure-theoretic context)
  • Spivak, Calculus on Manifolds (1965), Chapter 3 (integration on R^n and the change-of-variables formula)

Next Topics

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

0

No direct prerequisites are declared; this is treated as an entry point.

Derived topics

3