Integration and Change of Variables

Sneiderman, Robby

Foundations

Integration and Change of Variables

Riemann integration, improper integrals, the substitution rule, multivariate change of variables via the Jacobian determinant, and Fubini theorem. The computational backbone of probability and ML.

CoreTier 2StableSupporting~40 min

Quiz (6)Pulse Check Prereq Map

Why This Matters

Integration is the computational engine of probability and statistics. Every expectation, every marginal distribution, every normalizing constant, and every Bayesian posterior requires evaluating an integral. The change-of-variables formula is what allows you to transform distributions (e.g., from a Gaussian to any other distribution via a smooth map). Fubini's theorem is what lets you compute multivariate integrals by iterating single-variable integrals.

Riemann Integral Review

Definition

Riemann Integral

For a bounded function $f: [a, b] \to \mathbb{R}$ , the Riemann integral $\int_a^b f(x) \, dx$ is defined as the limit of Riemann sums:

$\int_a^b f(x) \, dx = \lim_{n \to \infty} \sum_{i=1}^n f(x_i^*) \Delta x_i$

where $\{x_0, x_1, \ldots, x_n\}$ is a partition of $[a, b]$ , $x_i^*$ is a sample point in $[x_{i-1}, x_i]$ , and $\Delta x_i = x_i - x_{i-1}$ . The limit exists (and is independent of the choice of partitions and sample points) when $f$ is Riemann integrable. Every continuous function on $[a, b]$ is Riemann integrable.

Improper Integrals

Definition

Improper Integral

When the domain is unbounded or the integrand is unbounded, define the integral as a limit:

$\int_a^\infty f(x) \, dx = \lim_{R \to \infty} \int_a^R f(x) \, dx$

The integral converges if and only if this limit exists and is finite. Example: the Gaussian normalizing constant $\int_{-\infty}^{\infty} e^{-x^2/2} dx = \sqrt{2\pi}$ is an improper integral that converges.

Improper integrals arise constantly in ML: the normalization of probability density functions, expectations over unbounded domains, and integrals involving heavy-tailed distributions.

Change of Variables (One Dimension)

Definition

Substitution Rule

If $g: [a, b] \to \mathbb{R}$ is $C^1$ and $f$ is continuous, then:

$\int_a^b f(g(x)) g'(x) \, dx = \int_{g(a)}^{g(b)} f(u) \, du$

This is the substitution $u = g(x)$ , $du = g'(x) \, dx$ .

Multivariate Change of Variables

Theorem

Change of Variables Formula

Statement

Let $\phi: U \to V$ be a $C^1$ diffeomorphism between open subsets $U, V \subseteq \mathbb{R}^n$ . For any integrable function $f: V \to \mathbb{R}$ :

$\int_V f(y) \, dy = \int_U f(\phi(x)) \, |\det D\phi(x)| \, dx$

where $D\phi(x)$ is the $n \times n$ Jacobian matrix of $\phi$ at $x$ , and $|\det D\phi(x)|$ is the absolute value of its determinant.

Intuition

The Jacobian determinant measures how $\phi$ stretches or compresses volume. A small cube of volume $dV$ at $x$ maps to a region of approximate volume $|\det D\phi(x)| \, dV$ at $\phi(x)$ . The formula says: to integrate over the image, integrate over the preimage and multiply by this volume scaling factor.

Proof Sketch

For a linear map $\phi(x) = Ax$ , the result follows from the definition of the determinant as the volume scaling factor. For nonlinear $\phi$ , approximate it locally by its linearization $D\phi(x)$ on small cubes, apply the linear result, and sum. The rigorous proof uses the Lebesgue measure and approximation by simple functions.

Why It Matters

This formula is used everywhere in ML and statistics:

Probability: if $X$ has density $p_X$ and $Y = \phi(X)$ , then $p_Y(y) = p_X(\phi^{-1}(y)) \cdot |\det D\phi^{-1}(y)|$
Normalizing flows: the log-likelihood involves $\log |\det D\phi(x)|$ , and flow architectures are designed to make this determinant cheap to compute
Bayesian inference: computing posteriors requires integrating over parameter spaces, often after a change of variables

Failure Mode

The formula as stated requires $\phi$ to be a $C^1$ diffeomorphism (smooth with smooth inverse). Two natural relaxations come up in practice and require care:

Non-injective $\phi$ . Partition the domain into regions on which $\phi$ is injective and sum the contributions, weighting each piece by the multiplicity of $\phi^{-1}$ over the image (the area / coarea formula of geometric measure theory; Federer 1969).
Critical points where $\det D\phi = 0$ . If the critical set has Lebesgue measure zero and $\phi$ remains locally injective off that set with controlled multiplicity, then those points contribute nothing and the standard formula holds on the regular part (Sard's theorem ensures the image of the critical set is null when $\phi$ is sufficiently smooth). If, however, $\phi$ collapses a set of positive measure (e.g. projects to a lower-dimensional submanifold) or is not injective in a controlled way, you cannot simply ignore the singular region — you must use the coarea formula or push forward the Lebesgue measure explicitly.

In short: "the singular set has measure zero, so it contributes nothing" is true for a $C^1$ diffeomorphism with isolated critical points; it is not a free pass for arbitrary singular or non-injective maps.

report a correction →

Example

Polar coordinates

The transformation $\phi(r, \theta) = (r\cos\theta, r\sin\theta)$ maps $(0, \infty) \times [0, 2\pi)$ to $\mathbb{R}^2 \setminus \{0\}$ . The Jacobian:

$D\phi = \begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}, \quad |\det D\phi| = r$

So $\int_{\mathbb{R}^2} f(x, y) \, dx \, dy = \int_0^{2\pi} \int_0^\infty f(r\cos\theta, r\sin\theta) \, r \, dr \, d\theta$ .

This is how you compute the Gaussian integral: $\int_{\mathbb{R}^2} e^{-(x^2+y^2)/2} dx \, dy = \int_0^{2\pi} \int_0^\infty e^{-r^2/2} r \, dr \, d\theta = 2\pi$ .

Fubini's Theorem

Theorem

Fubini's Theorem

Statement

If $f: X \times Y \to \mathbb{R}$ is integrable (i.e., $\int_{X \times Y} |f(x, y)| \, d(x, y) < \infty$ ), then:

$\int_{X \times Y} f(x, y) \, d(x, y) = \int_X \left(\int_Y f(x, y) \, dy\right) dx = \int_Y \left(\int_X f(x, y) \, dx\right) dy$

The order of integration can be swapped.

Intuition

If the total integral is finite, you can compute a double integral by integrating one variable at a time, in either order. This is what makes multivariate integration tractable: you reduce it to a sequence of one-dimensional integrals.

Proof Sketch

The proof uses the monotone convergence theorem and the construction of product measures. For non-negative functions, the result follows from Tonelli's theorem (which does not require integrability, only measurability and non-negativity). Fubini extends this to signed functions by decomposing into positive and negative parts.

Why It Matters

Fubini's theorem is the justification for: (1) computing marginal distributions by integrating out variables, (2) switching the order of expectation and summation, (3) computing normalizing constants by iterated integration, and (4) the tower property of conditional expectation.

Failure Mode

The integrability condition is necessary. If $\int |f(x,y)| \, d(x,y) = \infty$ , the iterated integrals may exist but give different values depending on the order of integration. The classic counterexample uses $f(x,y) = (x^2 - y^2)/(x^2 + y^2)^2$ on $[0,1]^2$ .

report a correction →

Applications in ML

Marginalizing Distributions

Given a joint density $p(x, y)$ , the marginal density of $x$ is:

$p(x) = \int p(x, y) \, dy$

This uses Fubini to reduce a multivariate integral to a single-variable one.

Computing Normalizing Constants

A density $p(x) = \frac{1}{Z} \tilde{p}(x)$ where $Z = \int \tilde{p}(x) \, dx$ . In Bayesian inference, computing $Z$ (the evidence) often requires a change of variables to make the integral tractable.

Normalizing Flows

A normalizing flow transforms a simple base distribution $p_Z(z)$ through a diffeomorphism $f$ to get $p_X(x) = p_Z(f^{-1}(x)) \cdot |\det Df^{-1}(x)|$ . The change-of-variables formula makes this exact.

Common Confusions

Watch Out

The Jacobian determinant is the absolute value

In the change-of-variables formula for integrals, you use $|\det D\phi|$ , not $\det D\phi$ . The absolute value ensures the integral is non-negative regardless of whether the transformation preserves or reverses orientation. For probability density transformations, forgetting the absolute value gives wrong densities.

Watch Out

Fubini requires integrability, Tonelli does not

Tonelli's theorem (for non-negative functions) allows you to swap integration order without checking integrability first. This is useful because you can establish integrability by computing the iterated integral. Fubini applies to signed functions but requires you to verify integrability of $|f|$ first.

Summary

Substitution: $\int f(g(x))g'(x) \, dx = \int f(u) \, du$ with $u = g(x)$
Multivariate change of variables: multiply by $|\det D\phi(x)|$ when transforming coordinates
The Jacobian determinant measures local volume change
Fubini: swap integration order when $\int |f| < \infty$
These tools compute expectations, marginals, normalizing constants, and flow densities

Exercises

ExerciseCore

Problem

Compute $\int_0^\infty x e^{-x^2/2} \, dx$ using the substitution $u = x^2/2$ .

ExerciseAdvanced

Problem

Let $X \sim \mathcal{N}(0, 1)$ and $Y = e^X$ (so $Y$ is log-normal). Use the change-of-variables formula to derive the density of $Y$ .

References

Canonical:

Rudin, Principles of Mathematical Analysis (1976), Chapters 6 and 10
Folland, Real Analysis (1999), Chapter 2 (Lebesgue integration and product measures)
Apostol, Mathematical Analysis (1974), Chapters 10-11 (Riemann integration and multivariable change of variables)

Current:

Kobyzev et al., "Normalizing Flows: An Introduction and Review of Current Methods" (2021). Change-of-variables in deep learning.
Billingsley, Probability and Measure (1995), Chapter 3 (integration and Fubini's theorem in measure-theoretic context)
Spivak, Calculus on Manifolds (1965), Chapter 3 (integration on R^n and the change-of-variables formula)

Next Topics

Common probability distributions: where these integration tools are applied
Measure-theoretic probability: the rigorous foundation for integration in probability

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

0

No direct prerequisites are declared; this is treated as an entry point.

Derived topics

3

Common Probability Distributionslayer 0A · tier 1
Normal Distributionlayer 0A · tier 1
Measure-Theoretic Probabilitylayer 0B · tier 1

Graph-backed continuations

Common Probability Distributions Measure-Theoretic Probability