Mathematical Infrastructure
Stochastic Differential Equations
SDEs of the form dX = b dt + sigma dB: strong and weak solutions, existence and uniqueness under Lipschitz conditions, Euler-Maruyama discretization, and the canonical examples that appear throughout ML (Ornstein-Uhlenbeck, geometric Brownian motion, Langevin dynamics).
Prerequisites
Why This Matters
An SDE is a differential equation driven by Brownian motion:
This is the mathematical object behind Langevin dynamics (the sampler inside SGLD), the forward process of diffusion models, the continuous-time limit of stochastic gradient descent, the state dynamics in continuous-time RL, and the Black-Scholes model in finance. Knowing when an SDE has a unique solution, how to discretize it, and how the solution's density evolves (Fokker-Planck) is the bridge between the stochastic-calculus toolbox and the models that use it.
Mental Model
An SDE is an ODE with noise. The drift pulls the trajectory deterministically; the diffusion injects random fluctuations proportional to . The interplay between drift and diffusion determines whether the process has a stationary distribution, how fast it mixes, and whether samples from the SDE can be used for inference.
The integral form is more honest than the differential notation:
The first integral is an ordinary Riemann integral. The second is an Itô integral, which requires different rules because has infinite variation.
Formal Setup
Strong Solution
A strong solution of is a process that is adapted to the filtration of the given Brownian motion and satisfies the integral equation pathwise. The solution is built on top of the specific noise realization .
Weak Solution
A weak solution is a probability space, a Brownian motion on that space, and a process satisfying the SDE. The Brownian motion is part of the solution, not given in advance. Weak existence is a statement about the law of ; strong existence is a statement about pathwise construction from a given .
Pathwise Uniqueness
The SDE has pathwise uniqueness if and only if any two strong solutions , driven by the same Brownian motion and starting from the same initial condition satisfy .
Euler-Maruyama Discretization
Given time step and grid , the Euler-Maruyama scheme approximates the SDE by:
where are i.i.d. This is the stochastic analogue of forward Euler.
Main Theorems
Existence and Uniqueness for SDEs
Statement
If the drift and diffusion satisfy
for all (Lipschitz condition), and
(linear growth condition), then for any initial condition with , the SDE has a unique strong solution on for any . The solution satisfies .
Intuition
Lipschitz drift and diffusion prevent the solution from branching (uniqueness) or exploding (existence). The proof is a stochastic Picard iteration: define as the integral of and evaluated at , then show the sequence converges using the Lipschitz bound and Grönwall's inequality. This mirrors the deterministic ODE proof, with norms replacing sup-norms.
Proof Sketch
Define the Picard iterates and . Using Itô isometry and the Lipschitz bound:
Iterating gives the bound , which is summable. The series converges in , giving the solution. Uniqueness follows from applying the same Grönwall argument to the difference of two solutions.
Why It Matters
This theorem tells you when an SDE is well-posed. Langevin dynamics (, ) has a unique solution whenever the potential has Lipschitz gradient. The Ornstein-Uhlenbeck process satisfies the conditions with linear drift. Geometric Brownian motion has multiplicative noise , which is globally Lipschitz with constant and satisfies the linear growth condition, so the theorem applies directly without any localization argument; multiplicative-noise SDEs only require localization when grows superlinearly (e.g. ) or is non-Lipschitz on bounded sets (e.g. for the CIR process).
Failure Mode
When the Lipschitz condition fails, solutions may not be unique. The classic example: with . The diffusion coefficient is Hölder- but not Lipschitz at . Pathwise uniqueness fails, though weak uniqueness (uniqueness in law) still holds by the Yamada-Watanabe theorem (see below). When both Lipschitz and linear growth fail, solutions can explode in finite time.
Yamada-Watanabe Uniqueness Conditions
The Lipschitz hypothesis on in the existence-and-uniqueness theorem can be weakened. Yamada-Watanabe (1971) proved pathwise uniqueness for one-dimensional SDEs under
where is increasing with and for every (the Yamada-Watanabe modulus). Equivalent conditions on are weaker (Lipschitz suffices). The canonical example: satisfies the integral condition (since ), so gives pathwise uniqueness even though it is not Lipschitz at . This justifies the standard treatment of the CIR process and similar square-root diffusions.
The companion result (Yamada-Watanabe 1971): weak existence + pathwise uniqueness implies strong existence. So once you check Yamada-Watanabe pathwise uniqueness and any weak existence result, you automatically get a unique strong solution.
Euler-Maruyama Convergence
Statement
Under the Lipschitz and linear growth conditions, the Euler-Maruyama scheme converges:
- Strong convergence (pathwise):
- Weak convergence (distributional): for smooth test functions ,
Strong order is ; weak order is .
Intuition
Strong convergence is slower than for deterministic ODEs (order vs order ) because the Brownian noise introduces fluctuations that cannot be captured by a single Euler step. Weak convergence is faster (order ) because distributional errors average out across realizations. If you care about the law of (Monte Carlo estimation), use the weak rate; if you need the actual path (e.g., coupling arguments), use the strong rate.
Proof Sketch
For strong convergence: expand using the integral form, subtract the Euler step, and bound the remainder using Itô isometry and the Lipschitz condition. The dominant error term is , which has magnitude per step after controlling . Summing steps and applying Grönwall's inequality gives the global bound.
For weak convergence: use the Itô-Taylor expansion of truncated at the appropriate order. The extra cancellation in the weak case comes from and .
Why It Matters
Every practical SDE simulation (SGLD, diffusion model sampling, Monte Carlo pricing) uses a discretization scheme. Euler-Maruyama is the simplest. Knowing the convergence orders tells you how many steps you need: halving the pathwise error requires 4x the steps (strong order ), but halving the distributional error requires only 2x the steps (weak order ). Higher-order schemes (Milstein, stochastic Runge-Kutta) improve the strong rate to by including the term.
Failure Mode
When depends on (multiplicative noise), the Euler-Maruyama scheme can produce negative values for processes that should be positive (e.g., geometric Brownian motion, CIR process). Implicit schemes or the Milstein correction handle this better. For stiff SDEs (large Lipschitz constant), explicit Euler requires very small for stability.
The Milstein Scheme
For scalar SDEs, the Milstein scheme (Milstein 1975) augments Euler-Maruyama with a correction that captures the Itô-Taylor term involving :
The extra term is the Itô correction for the second-order term in the stochastic Taylor expansion. When does not depend on (additive noise), and Milstein reduces to Euler-Maruyama. When has nontrivial state dependence, Milstein achieves strong order 1 versus Euler-Maruyama's strong order , at the cost of needing (a partial derivative). For multidimensional SDEs with non-commutative noise, the Milstein scheme requires Lévy area approximations and becomes substantially more involved.
Canonical Examples
Ornstein-Uhlenbeck Process
with . This is the continuous-time analogue of an AR(1) process. The solution is . The stationary distribution is . In ML: this is the SDE behind SGLD (Langevin dynamics for a quadratic potential ) and the forward process of variance-preserving diffusion models.
Geometric Brownian Motion
. Applying Itô's formula to gives . This is the Black-Scholes stock price model. The Itô correction is the reason the geometric mean return is lower than the arithmetic mean return; it is also why despite the in the exponent.
CIR Process (Square-Root Diffusion)
with (Feller condition). This models interest rates and variance processes in finance. The square-root diffusion vanishes at , preventing the process from going negative when the Feller condition holds. The stationary distribution is Gamma.
The Fokker-Planck Connection
If solves and has a smooth density , that density satisfies the Fokker-Planck equation (forward Kolmogorov equation):
This PDE governs how the probability mass evolves. In diffusion models, the forward SDE has a known Fokker-Planck equation whose solution converges to a Gaussian; the reverse SDE (score-based generation) runs the same PDE backward in time.
Common Confusions
Ito vs Stratonovich gives different SDEs, not different solutions to the same SDE
The SDE in the Itô sense and in the Stratonovich sense are different equations with different solutions. They can be converted: a Stratonovich SDE equals the Itô SDE . The choice of convention is not a matter of taste when the diffusion coefficient depends on .
Strong order 1/2 does not mean Euler-Maruyama is useless
The strong rate looks slow, but for Monte Carlo estimation (the main use case), only the weak rate matters. With weak order , Euler-Maruyama converges as fast as forward Euler does for ODEs when you care about expectations. Strong convergence matters for path-dependent functionals and coupling arguments, not for computing .
Not every SDE has a stationary distribution
An SDE has a stationary distribution only if drift and diffusion balance so that probability mass reaches an equilibrium. The Ornstein-Uhlenbeck process has one ( pulls mass back to zero). Geometric Brownian motion does not (it drifts to or depending on the sign of ). Checking for stationarity requires verifying that the Fokker-Planck equation has a normalizable steady-state solution.
Exercises
Problem
Apply Itô's formula to and the Ornstein-Uhlenbeck SDE to derive . Use this to compute when .
Problem
The Milstein scheme adds the term to the Euler-Maruyama step. Show that for geometric Brownian motion , the Milstein scheme gives the exact solution at grid points (strong order 1).
References
Canonical textbooks:
- Oksendal, Stochastic Differential Equations (6th ed., Springer, 2003), Chapters 5-8.
- Karatzas & Shreve, Brownian Motion and Stochastic Calculus (2nd ed., Springer, 1991), Chapter 5.
- Revuz & Yor, Continuous Martingales and Brownian Motion (3rd ed., Springer, 1999). The standard reference for the modern martingale-theoretic treatment.
- Protter, Stochastic Integration and Differential Equations (2nd ed., Springer, 2004). Semimartingale approach with general jump-diffusion theory.
- Kloeden & Platen, Numerical Solution of Stochastic Differential Equations (Springer, 1992), Chapters 9-10. The reference for Euler-Maruyama, Milstein, and stochastic Runge-Kutta.
Foundational papers:
- Yamada & Watanabe, "On the Uniqueness of Solutions of Stochastic Differential Equations" (J. Math. Kyoto 11, 1971). The Yamada-Watanabe pathwise-uniqueness criterion and the weak-existence-plus-pathwise-uniqueness theorem.
- Milstein, "Approximate Integration of Stochastic Differential Equations" (Theory of Probability and Its Applications 19(3), 1975). The Milstein scheme.
Current:
- Pavliotis, Stochastic Processes and Applications (Springer, 2014), Chapters 3-4. ML-friendly treatment of OU, Langevin, and Fokker-Planck.
- Le Gall, Brownian Motion, Martingales, and Stochastic Calculus (Springer, 2016), Chapter 7.
- Da Prato & Zabczyk, Stochastic Equations in Infinite Dimensions (2nd ed., Cambridge, 2014). The reference for SPDEs and infinite-dimensional SDEs (relevant to function-space diffusion models).
- Song et al., "Score-Based Generative Modeling through Stochastic Differential Equations" (ICLR 2021; arXiv:2011.13456). SDE framework for diffusion models; the reverse-time SDE and probability flow ODE.
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
2- Ito's Lemmalayer 3 · tier 2
- Stochastic Calculus for MLlayer 3 · tier 3
Derived topics
11- Score Matchinglayer 3 · tier 1
- PDE Fundamentals for Machine Learninglayer 1 · tier 2
- Backward Stochastic Differential Equationslayer 3 · tier 2
- Feynman–Kac Formulalayer 3 · tier 2
- Fokker–Planck Equationlayer 3 · tier 2
+6 more on the derived-topics page.
Graph-backed continuations