Skip to main content

Foundations

Metric Spaces, Convergence, and Completeness

Metric space axioms, convergence of sequences, Cauchy sequences, completeness, and the Banach fixed-point theorem.

CoreTier 1StableSupporting~55 min

Why This Matters

Convergence guarantees for optimization algorithms (gradient descent, EM, iterative methods) require a precise notion of distance and completeness. The Banach fixed-point theorem gives convergence of , which appears in value iteration (reinforcement learning), iterative solvers, and fixed-point equations throughout ML.

The notation ("for all") and ("there exists") will appear in every definition below.

This page is "core" in a specific sense: not because every ML practitioner needs abstract metric topology on day one, but because the language of distance, Cauchy behavior, completeness, and contractions sits underneath the cleanest convergence arguments in optimization, RL, and function-space approximation.

What To Keep From This Page

IdeaThe question it answersStandard failure if absent
Metricwhat does "close" mean?no coherent notion of continuity or convergence
Convergent sequencedo the iterates approach a specific point?limit candidate may be wrong or undefined
Cauchy sequenceare the iterates internally stabilizing?you may not yet know the limit
Complete spacedoes every stabilizing sequence land inside the space?Cauchy sequences can converge "outside" the model space
Contraction mapdo repeated updates shrink errors uniformly?fixed-point iteration may stall, drift, or fail to exist

Core Definitions

Definition

Metric Space

A metric space is a set XX with a function d:X×X[0,)d: X \times X \to [0, \infty) satisfying:

  1. Identity of indiscernibles: d(x,y)=0    x=yd(x, y) = 0 \iff x = y
  2. Symmetry: d(x,y)=d(y,x)d(x, y) = d(y, x)
  3. Triangle inequality: d(x,z)d(x,y)+d(y,z)d(x, z) \leq d(x, y) + d(y, z)

Key Examples

Euclidean space Rn\mathbb{R}^n with d(x,y)=xy2=i=1n(xiyi)2d(x, y) = \|x - y\|_2 = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}. More generally, any norm \|\cdot\| on Rn\mathbb{R}^n induces a metric d(x,y)=xyd(x,y) = \|x - y\|. The p\ell^p norms (1p1 \leq p \leq \infty) all give valid metrics.

Discrete metric. On any set XX, define d(x,y)=1d(x, y) = 1 if xyx \neq y and d(x,x)=0d(x, x) = 0. Every subset is open, every sequence that converges is eventually constant, and the space is always complete. This is useful as a degenerate case to test whether a theorem's hypotheses are too weak.

Function spaces. C([a,b])C([a, b]), the continuous functions on [a,b][a, b], with the supremum metric d(f,g)=supt[a,b]f(t)g(t)d(f, g) = \sup_{t \in [a,b]} |f(t) - g(t)|. This is a complete metric space (a Banach space). Convergence in this metric is uniform convergence, which preserves continuity. The weaker pointwise convergence does not come from a metric on C([a,b])C([a,b]) and does not preserve continuity.

p\ell^p sequence spaces. Sequences (x1,x2,)(x_1, x_2, \ldots) with xip<\sum |x_i|^p < \infty, using d(x,y)=(xiyip)1/pd(x, y) = (\sum |x_i - y_i|^p)^{1/p}. Complete for 1p1 \leq p \leq \infty. These spaces appear in functional analysis and approximation theory.

Definition

Open and Closed Sets

A set UXU \subseteq X is open if and only if for every xUx \in U there exists ϵ>0\epsilon > 0 such that the ball B(x,ϵ)={yX:d(x,y)<ϵ}B(x, \epsilon) = \{y \in X : d(x, y) < \epsilon\} is contained in UU. A set is closed if and only if its complement is open, equivalently if it contains all its limit points.

Convergence and Cauchy Sequences

Definition

Convergence of Sequences

A sequence (xn)(x_n) in (X,d)(X, d) converges to xXx \in X if and only if for every ϵ>0\epsilon > 0 there exists NN such that d(xn,x)<ϵd(x_n, x) < \epsilon for all nNn \geq N. The limit is unique in a metric space.

Uniqueness follows from the triangle inequality: if xnxx_n \to x and xnyx_n \to y, then 0d(x,y)d(x,xn)+d(xn,y)00 \leq d(x, y) \leq d(x, x_n) + d(x_n, y) \to 0, so d(x,y)=0d(x, y) = 0 and x=yx = y.

Definition

Cauchy Sequence

A sequence (xn)(x_n) is Cauchy if and only if for every ϵ>0\epsilon > 0 there exists NN such that d(xm,xn)<ϵd(x_m, x_n) < \epsilon for all m,nNm, n \geq N. Every convergent sequence is Cauchy, but the converse requires completeness.

The Cauchy condition is an intrinsic property: it refers only to distances between terms of the sequence, not to a candidate limit. This is critical because in many spaces (e.g., incomplete spaces, or when proving existence of a limit), the limit is unknown.

Every convergent sequence is Cauchy. If xnxx_n \to x, then d(xm,xn)d(xm,x)+d(x,xn)<ϵ/2+ϵ/2=ϵd(x_m, x_n) \leq d(x_m, x) + d(x, x_n) < \epsilon/2 + \epsilon/2 = \epsilon for m,nm, n large enough.

The converse fails without completeness. In Q\mathbb{Q} with the usual metric, the sequence of rational approximations to 2\sqrt{2} (e.g., via Newton's method) is Cauchy but does not converge in Q\mathbb{Q}.

Completeness

Definition

Complete Metric Space

A metric space (X,d)(X, d) is complete if and only if every Cauchy sequence in XX converges to a point in XX.

Completeness is a property of the space, not of individual sequences. It guarantees that "sequences that should converge" actually do converge within the space.

Completeness is preserved by closure. Any closed subset of a complete metric space is itself complete. Conversely, a complete subspace of a metric space is closed.

Completion. Every metric space can be embedded isometrically into a complete metric space (its completion). R\mathbb{R} is the completion of Q\mathbb{Q}. This is analogous to how the Lebesgue integral completes the Riemann integral in measure-theoretic probability.

Compactness in Metric Spaces

Definition

Compact Metric Space

A metric space is compact if and only if every open cover has a finite subcover. In metric spaces, this is equivalent to: every sequence has a convergent subsequence (sequential compactness).

In Rn\mathbb{R}^n, the Heine-Borel theorem characterizes compact sets as those that are closed and bounded. In infinite-dimensional spaces, closed and bounded does not imply compact. The closed unit ball in 2\ell^2 is closed and bounded but not compact.

Why compactness matters for ML. Compactness guarantees that continuous functions attain their extrema (extreme value theorem). Many existence proofs in optimization require compactness of the feasible set. If you are minimizing a continuous loss over a compact parameter space, a minimizer exists. Without compactness, you need additional arguments (coercivity, lower semicontinuity) to guarantee existence.

Contraction Mappings

Definition

Contraction Mapping

A function T:XXT: X \to X is a contraction if and only if there exists γ[0,1)\gamma \in [0, 1) such that d(T(x),T(y))γd(x,y)d(T(x), T(y)) \leq \gamma \, d(x, y) for all x,yXx, y \in X. The constant γ\gamma is the contraction factor.

A contraction is automatically uniformly continuous (and hence continuous). The contraction factor γ\gamma controls the convergence rate: the error after nn iterations decays as γn\gamma^n, giving geometric (linear) convergence.

Main Theorems

Theorem

Banach Fixed-Point Theorem (Contraction Mapping Theorem)

Statement

Let (X,d)(X, d) be a nonempty complete metric space and T:XXT: X \to X a contraction with factor γ[0,1)\gamma \in [0, 1). Then TT has a unique fixed point xXx^* \in X (i.e., T(x)=xT(x^*) = x^*), and for any starting point x0Xx_0 \in X the iterates xn+1=T(xn)x_{n+1} = T(x_n) converge to xx^* with error bound:

d(xn,x)γn1γd(x1,x0)d(x_n, x^*) \leq \frac{\gamma^n}{1 - \gamma} d(x_1, x_0)

Both nonemptiness and completeness are essential: without them either the fixed point fails to exist or the Cauchy iterates have no limit in XX.

Intuition

A contraction shrinks distances. Iterating it produces a Cauchy sequence (consecutive terms get geometrically closer). Completeness guarantees this sequence converges. Uniqueness follows because two fixed points would have to be distance zero apart.

Proof Sketch

For m>nm > n: d(xn,xm)k=nm1d(xk,xk+1)k=nm1γkd(x0,x1)γn1γd(x0,x1)d(x_n, x_m) \leq \sum_{k=n}^{m-1} d(x_k, x_{k+1}) \leq \sum_{k=n}^{m-1} \gamma^k d(x_0, x_1) \leq \frac{\gamma^n}{1-\gamma} d(x_0, x_1). This shows (xn)(x_n) is Cauchy. By completeness, xnxx_n \to x^*. Continuity of TT gives T(x)=limT(xn)=limxn+1=xT(x^*) = \lim T(x_n) = \lim x_{n+1} = x^*. For uniqueness: if T(y)=yT(y) = y, then d(x,y)=d(T(x),T(y))γd(x,y)d(x^*, y) = d(T(x^*), T(y)) \leq \gamma d(x^*, y), so d(x,y)=0d(x^*, y) = 0.

Why It Matters

This theorem gives both existence and a constructive algorithm (iterate TT) with a convergence rate. In RL, the Bellman operator is a contraction in the sup-norm with factor γ\gamma (the discount factor), so value iteration converges. In optimization, proximal operators of convex functions are firmly nonexpansive (a strictly stronger property than 1-Lipschitz, but weaker than being a strict contraction); they are genuine contractions only in special cases (e.g. prox of a strongly convex function with μ>0\mu > 0). Convergence of plain proximal-point iteration therefore follows from Krasnoselskii–Mann or Opial-style arguments rather than directly from Banach.

Failure Mode

Fails if γ=1\gamma = 1 (the map is merely non-expansive, not a contraction). Example: T(x)=x+1T(x) = x + 1 on R\mathbb{R} preserves distances but has no fixed point. Fails if the space is not complete: T(x)=x/2T(x) = x/2 on (0,1)(0, 1) is a contraction but the fixed point 00 is not in the space.

Standard Metrics

SpaceMetricComplete?
Rn\mathbb{R}^nd(x,y)=xy2d(x,y) = \|x - y\|_2Yes
C([0,1])C([0,1]) (continuous functions)d(f,g)=suptf(t)g(t)d(f,g) = \sup_t \lvert f(t) - g(t)\rvertYes
2\ell^2 (square-summable sequences)d(x,y)=(xiyi)2d(x,y) = \sqrt{\sum (x_i - y_i)^2}Yes
Q\mathbb{Q} (rationals)d(x,y)=xyd(x,y) = \lvert x - y\rvertNo

Comparison Table

PropertySequence languageSpace languageWhy ML cares
Convergencexnxx_n \to xthe limit point already lives in the spaceiterative algorithms have a meaningful target
Cauchytail points get arbitrarily close to each otherintrinsic notion, no limit supplied yetuseful when the fixed point or solution is not known beforehand
Completenessevery Cauchy sequence convergesno missing limit pointsBanach fixed-point and many existence proofs go through
Compactnessevery sequence has a convergent subsequencestronger than completenessguarantees minimizers for continuous objectives on compact sets

Common Confusions

Watch Out

Cauchy does not imply convergent without completeness

The sequence xn=1/nx_n = 1/n in (0,1)(0, 1) is Cauchy but does not converge in (0,1)(0, 1) because the limit 00 is not in the space. Completeness is exactly the property that rules out this pathology.

Watch Out

Contraction factor must be strictly less than 1

The map T(x)=x/(1+x)T(x) = x/(1 + x) on [0,)[0, \infty) satisfies d(T(x),T(y))<d(x,y)d(T(x), T(y)) < d(x, y) for xyx \neq y but is not a contraction (no uniform γ<1\gamma < 1). It still has a fixed point (x=0x = 0), but the Banach theorem does not apply, and convergence may be arbitrarily slow.

Watch Out

Closed and bounded does not imply compact in infinite dimensions

In Rn\mathbb{R}^n, closed plus bounded equals compact (Heine-Borel). In infinite-dimensional spaces this fails. The closed unit ball in 2\ell^2 is closed and bounded but not compact: the sequence e1,e2,e3,e_1, e_2, e_3, \ldots of standard basis vectors has no convergent subsequence (all pairwise distances are 2\sqrt{2}). Compactness in infinite dimensions requires additional conditions such as total boundedness.

Watch Out

Complete does not mean compact

Completeness says Cauchy sequences converge. Compactness says every sequence has a convergent subsequence. R\mathbb{R} with its usual metric is complete but not compact; the sequence xn=nx_n = n has no convergent subsequence. In optimization arguments, confusing these two loses track of whether you are proving existence of a limit or existence of a minimizer.

Exercises

ExerciseCore

Problem

Show that d(x,y)=xyd(x, y) = |x - y| is a metric on R\mathbb{R}. Which of the three axioms is the hardest to verify?

ExerciseAdvanced

Problem

Let T(x)=cos(x)T(x) = \cos(x). Show that TT is not a contraction on all of R\mathbb{R}, but that TT restricted to [1,1][-1, 1] is a contraction with factor γ=sin(1)0.841\gamma = \sin(1) \approx 0.841. Since cos\cos maps R\mathbb{R} into [1,1][-1, 1] in one step, conclude that iteration from any x0Rx_0 \in \mathbb{R} converges. What is the fixed point?

References

Canonical:

  • Rudin, Principles of Mathematical Analysis (1976), Chapters 2-3
  • Kreyszig, Introductory Functional Analysis with Applications (1989), Chapters 1-2
  • Munkres, Topology (2000), Sections 20-28

Supplementary:

  • Sutherland, Introduction to Metric and Topological Spaces (2009), Chapters 3-7
  • Aliprantis & Border, Infinite Dimensional Analysis (2006), Chapter 3

For ML context:

  • Bertsekas, Dynamic Programming and Optimal Control (2012), Chapter 1 (contraction mappings in RL)
  • Puterman, Markov Decision Processes (2005), Chapter 6.2 (contraction operators for value iteration)

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

1

Derived topics

8

+3 more on the derived-topics page.