Metric Spaces, Convergence, and Completeness

Sneiderman, Robby

Foundations

Metric Spaces, Convergence, and Completeness

Metric space axioms, convergence of sequences, Cauchy sequences, completeness, and the Banach fixed-point theorem.

CoreTier 1StableSupporting~55 min

Prerequisites

Sets Functions and Relations

Quiz (16)Pulse Check Prereq Map

Why This Matters

Convergence guarantees for optimization algorithms (gradient descent, EM, iterative methods) require a precise notion of distance and completeness. The Banach fixed-point theorem gives convergence of , which appears in value iteration (reinforcement learning), iterative solvers, and fixed-point equations throughout ML.

The notation ("for all") and ("there exists") will appear in every definition below.

This page is "core" in a specific sense: not because every ML practitioner needs abstract metric topology on day one, but because the language of distance, Cauchy behavior, completeness, and contractions sits underneath the cleanest convergence arguments in optimization, RL, and function-space approximation.

What To Keep From This Page

Idea	The question it answers	Standard failure if absent
Metric	what does "close" mean?	no coherent notion of continuity or convergence
Convergent sequence	do the iterates approach a specific point?	limit candidate may be wrong or undefined
Cauchy sequence	are the iterates internally stabilizing?	you may not yet know the limit
Complete space	does every stabilizing sequence land inside the space?	Cauchy sequences can converge "outside" the model space
Contraction map	do repeated updates shrink errors uniformly?	fixed-point iteration may stall, drift, or fail to exist

Core Definitions

Definition

Metric Space $(X, d)$

A metric space is a set $X$ with a function $d: X \times X \to [0, \infty)$ satisfying:

Identity of indiscernibles: $d(x, y) = 0 \iff x = y$
Symmetry: $d(x, y) = d(y, x)$
Triangle inequality: $d(x, z) \leq d(x, y) + d(y, z)$

Key Examples

Euclidean space $\mathbb{R}^n$ with $d(x, y) = \|x - y\|_2 = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}$ . More generally, any norm $\|\cdot\|$ on $\mathbb{R}^n$ induces a metric $d(x,y) = \|x - y\|$ . The $\ell^p$ norms ( $1 \leq p \leq \infty$ ) all give valid metrics.

Discrete metric. On any set $X$ , define $d(x, y) = 1$ if $x \neq y$ and $d(x, x) = 0$ . Every subset is open, every sequence that converges is eventually constant, and the space is always complete. This is useful as a degenerate case to test whether a theorem's hypotheses are too weak.

Function spaces. $C([a, b])$ , the continuous functions on $[a, b]$ , with the supremum metric $d(f, g) = \sup_{t \in [a,b]} |f(t) - g(t)|$ . This is a complete metric space (a Banach space). Convergence in this metric is uniform convergence, which preserves continuity. The weaker pointwise convergence does not come from a metric on $C([a,b])$ and does not preserve continuity.

$\ell^p$ sequence spaces. Sequences $(x_1, x_2, \ldots)$ with $\sum |x_i|^p < \infty$ , using $d(x, y) = (\sum |x_i - y_i|^p)^{1/p}$ . Complete for $1 \leq p \leq \infty$ . These spaces appear in functional analysis and approximation theory.

Definition

Open and Closed Sets

A set $U \subseteq X$ is open if and only if for every $x \in U$ there exists $\epsilon > 0$ such that the ball $B(x, \epsilon) = \{y \in X : d(x, y) < \epsilon\}$ is contained in $U$ . A set is closed if and only if its complement is open, equivalently if it contains all its limit points.

Convergence and Cauchy Sequences

Definition

Convergence of Sequences

A sequence $(x_n)$ in $(X, d)$ converges to $x \in X$ if and only if for every $\epsilon > 0$ there exists $N$ such that $d(x_n, x) < \epsilon$ for all $n \geq N$ . The limit is unique in a metric space.

Uniqueness follows from the triangle inequality: if $x_n \to x$ and $x_n \to y$ , then $0 \leq d(x, y) \leq d(x, x_n) + d(x_n, y) \to 0$ , so $d(x, y) = 0$ and $x = y$ .

Definition

Cauchy Sequence

A sequence $(x_n)$ is Cauchy if and only if for every $\epsilon > 0$ there exists $N$ such that $d(x_m, x_n) < \epsilon$ for all $m, n \geq N$ . Every convergent sequence is Cauchy, but the converse requires completeness.

The Cauchy condition is an intrinsic property: it refers only to distances between terms of the sequence, not to a candidate limit. This is critical because in many spaces (e.g., incomplete spaces, or when proving existence of a limit), the limit is unknown.

Every convergent sequence is Cauchy. If $x_n \to x$ , then $d(x_m, x_n) \leq d(x_m, x) + d(x, x_n) < \epsilon/2 + \epsilon/2 = \epsilon$ for $m, n$ large enough.

The converse fails without completeness. In $\mathbb{Q}$ with the usual metric, the sequence of rational approximations to $\sqrt{2}$ (e.g., via Newton's method) is Cauchy but does not converge in $\mathbb{Q}$ .

Completeness

Definition

Complete Metric Space

A metric space $(X, d)$ is complete if and only if every Cauchy sequence in $X$ converges to a point in $X$ .

Completeness is a property of the space, not of individual sequences. It guarantees that "sequences that should converge" actually do converge within the space.

Completeness is preserved by closure. Any closed subset of a complete metric space is itself complete. Conversely, a complete subspace of a metric space is closed.

Completion. Every metric space can be embedded isometrically into a complete metric space (its completion). $\mathbb{R}$ is the completion of $\mathbb{Q}$ . This is analogous to how the Lebesgue integral completes the Riemann integral in measure-theoretic probability.

Compactness in Metric Spaces

Definition

Compact Metric Space

A metric space is compact if and only if every open cover has a finite subcover. In metric spaces, this is equivalent to: every sequence has a convergent subsequence (sequential compactness).

In $\mathbb{R}^n$ , the Heine-Borel theorem characterizes compact sets as those that are closed and bounded. In infinite-dimensional spaces, closed and bounded does not imply compact. The closed unit ball in $\ell^2$ is closed and bounded but not compact.

Why compactness matters for ML. Compactness guarantees that continuous functions attain their extrema (extreme value theorem). Many existence proofs in optimization require compactness of the feasible set. If you are minimizing a continuous loss over a compact parameter space, a minimizer exists. Without compactness, you need additional arguments (coercivity, lower semicontinuity) to guarantee existence.

Contraction Mappings

Definition

Contraction Mapping

A function $T: X \to X$ is a contraction if and only if there exists $\gamma \in [0, 1)$ such that $d(T(x), T(y)) \leq \gamma \, d(x, y)$ for all $x, y \in X$ . The constant $\gamma$ is the contraction factor.

A contraction is automatically uniformly continuous (and hence continuous). The contraction factor $\gamma$ controls the convergence rate: the error after $n$ iterations decays as $\gamma^n$ , giving geometric (linear) convergence.

Main Theorems

Theorem

Banach Fixed-Point Theorem (Contraction Mapping Theorem)

Statement

Let $(X, d)$ be a nonempty complete metric space and $T: X \to X$ a contraction with factor $\gamma \in [0, 1)$ . Then $T$ has a unique fixed point $x^* \in X$ (i.e., $T(x^*) = x^*$ ), and for any starting point $x_0 \in X$ the iterates $x_{n+1} = T(x_n)$ converge to $x^*$ with error bound:

$d(x_n, x^*) \leq \frac{\gamma^n}{1 - \gamma} d(x_1, x_0)$

Both nonemptiness and completeness are essential: without them either the fixed point fails to exist or the Cauchy iterates have no limit in $X$ .

Intuition

A contraction shrinks distances. Iterating it produces a Cauchy sequence (consecutive terms get geometrically closer). Completeness guarantees this sequence converges. Uniqueness follows because two fixed points would have to be distance zero apart.

Proof Sketch

For $m > n$ : $d(x_n, x_m) \leq \sum_{k=n}^{m-1} d(x_k, x_{k+1}) \leq \sum_{k=n}^{m-1} \gamma^k d(x_0, x_1) \leq \frac{\gamma^n}{1-\gamma} d(x_0, x_1)$ . This shows $(x_n)$ is Cauchy. By completeness, $x_n \to x^*$ . Continuity of $T$ gives $T(x^*) = \lim T(x_n) = \lim x_{n+1} = x^*$ . For uniqueness: if $T(y) = y$ , then $d(x^*, y) = d(T(x^*), T(y)) \leq \gamma d(x^*, y)$ , so $d(x^*, y) = 0$ .

Why It Matters

This theorem gives both existence and a constructive algorithm (iterate $T$ ) with a convergence rate. In RL, the Bellman operator is a contraction in the sup-norm with factor $\gamma$ (the discount factor), so value iteration converges. In optimization, proximal operators of convex functions are firmly nonexpansive (a strictly stronger property than 1-Lipschitz, but weaker than being a strict contraction); they are genuine contractions only in special cases (e.g. prox of a strongly convex function with $\mu > 0$ ). Convergence of plain proximal-point iteration therefore follows from Krasnoselskii–Mann or Opial-style arguments rather than directly from Banach.

Failure Mode

Fails if $\gamma = 1$ (the map is merely non-expansive, not a contraction). Example: $T(x) = x + 1$ on $\mathbb{R}$ preserves distances but has no fixed point. Fails if the space is not complete: $T(x) = x/2$ on $(0, 1)$ is a contraction but the fixed point $0$ is not in the space.

report a correction →

Standard Metrics

Space	Metric	Complete?
$\mathbb{R}^n$	$d(x,y) = \\|x - y\\|_2$	Yes
$C([0,1])$ (continuous functions)	$d(f,g) = \sup_t \lvert f(t) - g(t)\rvert$	Yes
$\ell^2$ (square-summable sequences)	$d(x,y) = \sqrt{\sum (x_i - y_i)^2}$	Yes
$\mathbb{Q}$ (rationals)	$d(x,y) = \lvert x - y\rvert$	No

Comparison Table

Property	Sequence language	Space language	Why ML cares
Convergence	$x_n \to x$	the limit point already lives in the space	iterative algorithms have a meaningful target
Cauchy	tail points get arbitrarily close to each other	intrinsic notion, no limit supplied yet	useful when the fixed point or solution is not known beforehand
Completeness	every Cauchy sequence converges	no missing limit points	Banach fixed-point and many existence proofs go through
Compactness	every sequence has a convergent subsequence	stronger than completeness	guarantees minimizers for continuous objectives on compact sets

Common Confusions

Watch Out

Cauchy does not imply convergent without completeness

The sequence $x_n = 1/n$ in $(0, 1)$ is Cauchy but does not converge in $(0, 1)$ because the limit $0$ is not in the space. Completeness is exactly the property that rules out this pathology.

Watch Out

Contraction factor must be strictly less than 1

The map $T(x) = x/(1 + x)$ on $[0, \infty)$ satisfies $d(T(x), T(y)) < d(x, y)$ for $x \neq y$ but is not a contraction (no uniform $\gamma < 1$ ). It still has a fixed point ( $x = 0$ ), but the Banach theorem does not apply, and convergence may be arbitrarily slow.

Watch Out

Closed and bounded does not imply compact in infinite dimensions

In $\mathbb{R}^n$ , closed plus bounded equals compact (Heine-Borel). In infinite-dimensional spaces this fails. The closed unit ball in $\ell^2$ is closed and bounded but not compact: the sequence $e_1, e_2, e_3, \ldots$ of standard basis vectors has no convergent subsequence (all pairwise distances are $\sqrt{2}$ ). Compactness in infinite dimensions requires additional conditions such as total boundedness.

Watch Out

Complete does not mean compact

Completeness says Cauchy sequences converge. Compactness says every sequence has a convergent subsequence. $\mathbb{R}$ with its usual metric is complete but not compact; the sequence $x_n = n$ has no convergent subsequence. In optimization arguments, confusing these two loses track of whether you are proving existence of a limit or existence of a minimizer.

Exercises

ExerciseCore

Problem

Show that $d(x, y) = |x - y|$ is a metric on $\mathbb{R}$ . Which of the three axioms is the hardest to verify?

ExerciseAdvanced

Problem

Let $T(x) = \cos(x)$ . Show that $T$ is not a contraction on all of $\mathbb{R}$ , but that $T$ restricted to $[-1, 1]$ is a contraction with factor $\gamma = \sin(1) \approx 0.841$ . Since $\cos$ maps $\mathbb{R}$ into $[-1, 1]$ in one step, conclude that iteration from any $x_0 \in \mathbb{R}$ converges. What is the fixed point?

References

Canonical:

Rudin, Principles of Mathematical Analysis (1976), Chapters 2-3
Kreyszig, Introductory Functional Analysis with Applications (1989), Chapters 1-2
Munkres, Topology (2000), Sections 20-28

Supplementary:

Sutherland, Introduction to Metric and Topological Spaces (2009), Chapters 3-7
Aliprantis & Border, Infinite Dimensional Analysis (2006), Chapter 3

For ML context:

Bertsekas, Dynamic Programming and Optimal Control (2012), Chapter 1 (contraction mappings in RL)
Puterman, Markov Decision Processes (2005), Chapter 6.2 (contraction operators for value iteration)

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Sets, Functions, and Relationslayer 0A · tier 1

Derived topics

8

Compactness and Heine-Borellayer 0A · tier 1
Continuity in Rⁿlayer 0A · tier 1
Modes of Convergence of Random Variableslayer 0B · tier 1
Sequences and Series of Functionslayer 0A · tier 2
Functional Analysis Corelayer 0B · tier 2

+3 more on the derived-topics page.

Graph-backed continuations

Compactness and Heine-Borel Continuity in Rⁿ Distance Metrics Compared Functional Analysis Core Hyperbolic Embeddings for Graphs Modes of Convergence of Random Variables Non-Euclidean and Hyperbolic Geometry Sequences and Series of Functions