Skip to main content

Foundations

Compactness and Heine-Borel

Sequential compactness, the Heine-Borel theorem in finite dimensions, the extreme value theorem, and why compactness is the key assumption in optimization.

CoreTier 1StableSupporting~40 min

Why This Matters

Optimization problems in ML ask: does a minimum exist? Compactness is the standard tool for guaranteeing existence of minimizers. The extreme value theorem says a continuous function on a compact set attains its minimum. Without compactness, minima may not exist (the is not achieved), and many proof strategies in learning theory break down.

This page leans on three pieces of notation: , , and the idea of a .

Core Definitions

Definition

Open Cover

An open cover of a set KXK \subseteq X is a collection of open sets {Uα}αI\{U_\alpha\}_{\alpha \in I} such that KαIUαK \subseteq \bigcup_{\alpha \in I} U_\alpha. A subcover is a subcollection that still covers KK.

Definition

Compactness

A subset KK of a metric space is compact if and only if every open cover of KK has a finite subcover. Compactness is a topological property: it does not depend on the particular metric, only on the topology.

Definition

Sequential Compactness

A set KK is sequentially compact if and only if every sequence in KK has a subsequence that converges to a point in KK. In metric spaces, compactness and sequential compactness are equivalent.

Definition

Bounded Set

A subset SS of a metric space (X,d)(X, d) is bounded if and only if there exist x0Xx_0 \in X and M>0M > 0 such that d(x,x0)Md(x, x_0) \leq M for all xSx \in S. In Rn\mathbb{R}^n, this means SS is contained in some ball of finite radius.

Main Theorems

Theorem

Heine-Borel Theorem

Statement

A subset KRnK \subseteq \mathbb{R}^n is compact if and only if KK is closed and bounded.

Intuition

Bounded means sequences cannot escape to infinity. Closed means limits of convergent subsequences stay in the set. Together, they guarantee every sequence has a convergent subsequence with limit in KK.

Proof Sketch

Compact implies closed and bounded: If KK is not bounded, the sequence xnx_n with xn\|x_n\| \to \infty has no convergent subsequence in KK. If KK is not closed, there is a limit point xKx \notin K; a sequence converging to xx has no subsequence converging in KK. Closed and bounded implies compact: By Bolzano-Weierstrass, every bounded sequence in Rn\mathbb{R}^n has a convergent subsequence. Since KK is closed, the limit is in KK.

Why It Matters

Heine-Borel makes compactness easy to check in Rn\mathbb{R}^n: just verify closed and bounded. This is the standard way to establish that a constrained optimization problem has a solution.

Failure Mode

Heine-Borel fails in infinite-dimensional spaces. The closed unit ball in 2\ell^2 is closed and bounded but not compact (the standard basis vectors e1,e2,e_1, e_2, \ldots have no convergent subsequence). In infinite dimensions, compactness is a much stronger condition than closed-and-bounded.

Theorem

Extreme Value Theorem

Statement

A continuous function f:KRf: K \to \mathbb{R} on a nonempty compact set KK attains its maximum and minimum. There exist xmin,xmaxKx_{\min}, x_{\max} \in K with:

f(xmin)f(x)f(xmax)for all xKf(x_{\min}) \leq f(x) \leq f(x_{\max}) \quad \text{for all } x \in K

Intuition

Compactness prevents the minimizing sequence from escaping (either to infinity or to a boundary point outside KK). Continuity preserves convergence, so the limit of a minimizing sequence is a minimizer.

Proof Sketch

Let m=infxKf(x)m = \inf_{x \in K} f(x). Take a sequence (xn)(x_n) with f(xn)mf(x_n) \to m. By compactness, a subsequence xnkxKx_{n_k} \to x^* \in K. By continuity, f(x)=limf(xnk)=mf(x^*) = \lim f(x_{n_k}) = m. So the infimum is attained at xx^*. Same argument for the supremum.

Why It Matters

This theorem is invoked implicitly whenever you write minwWL(w)\min_{w \in W} L(w) for a continuous loss LL over a compact parameter set WW. Without compactness, you can only write inf\inf, and the minimizer may not exist.

Failure Mode

Fails without compactness: f(x)=exf(x) = e^{-x} on (0,)(0, \infty) has inff=0\inf f = 0 but no minimizer.

Fails without continuity: define f:[0,1]Rf: [0, 1] \to \mathbb{R} by f(x)=xf(x) = x for x(0,1]x \in (0, 1] and f(0)=1f(0) = 1. The domain [0,1][0, 1] is compact, but ff is discontinuous at 00 (the right limit is 00 while f(0)=1f(0) = 1). The infimum is inff=0\inf f = 0, but no minimizer exists: every x(0,1]x \in (0, 1] gives f(x)=x>0f(x) = x > 0, and f(0)=10f(0) = 1 \neq 0. Continuity is the missing hypothesis, not compactness.

Compact ⟺ Complete + Totally Bounded

In any metric space the following are equivalent for a set KK:

  1. KK is compact.
  2. KK is sequentially compact: every sequence in KK has a subsequence converging to a point of KK.
  3. KK is complete and totally bounded.

Total boundedness means: for every ε>0\varepsilon > 0, KK can be covered by finitely many open balls of radius ε\varepsilon. The minimum number of such balls is the covering number N(K,ε)N(K, \varepsilon). This is the analytic origin of covering-number arguments in learning theory and high-dimensional probability, and it is the route by which compactness enters Rademacher complexity bounds via Dudley's chaining inequality.

This equivalence makes Heine-Borel feel less special. In Rn\mathbb{R}^n, "closed" means complete and "bounded" means totally bounded, so closed + bounded is exactly compactness. In an infinite-dimensional normed space, bounded does not imply totally bounded — the closed unit ball in 2\ell^2 is bounded but contains the orthonormal basis {en}\{e_n\} with enem2=2\|e_n - e_m\|_2 = \sqrt{2} for nmn \ne m, so no finite collection of ε\varepsilon-balls with ε<1/2\varepsilon < 1/\sqrt{2} covers it — which is why closed + bounded no longer implies compact in infinite dimensions.

Why Compactness Matters in ML

Compactness is one clean sufficient condition for several ML-relevant theorems, not the universal assumption. Be precise about what each theorem actually needs.

  1. Existence of minimizers via EVT. A continuous loss on a compact domain attains its minimum (extreme value theorem). A hard parameter constraint {w:wR}\{w : \|w\| \le R\} is closed and bounded in Rn\mathbb{R}^n, hence compact, hence the existence proof goes through directly. Ordinary L2L_2 regularization minwL(w)+λw2\min_w L(w) + \lambda \|w\|^2 on an unbounded domain is a different argument: coercivity (f(w)f(w) \to \infty as w\|w\| \to \infty) plus lower semicontinuity (or strict convexity) gives existence of a minimizer over Rn\mathbb{R}^n without literal compactness.
  2. Covering-number arguments. What is needed is total boundedness of the relevant hypothesis class at the relevant scale, not full compactness. Compactness implies total boundedness (the bridge theorem above), but many learning-theory bounds work under weaker entropy assumptions on the class.
  3. Continuity arguments. Compactness can upgrade pointwise to uniform statements only with extra hypotheses. Dini's theorem, for example, requires monotone convergence of continuous functions to a continuous limit on a compact space.
  4. Function-approximation theory. The Arzelà-Ascoli theorem characterizes relatively compact families in C(K)C(K) via uniform boundedness and equicontinuity. Universal approximation theorems are usually stated on compact domains for clean topology, but Arzelà-Ascoli is a separate compactness principle, not the engine of basic universal approximation.

Examples

Example

The closed unit ball in $\ell^2$ is closed and bounded but not compact

Let 2\ell^2 be the space of square-summable real sequences with norm x2=nxn2\|x\|_2 = \sqrt{\sum_n x_n^2}. The closed unit ball B={x2:x21}B = \{x \in \ell^2 : \|x\|_2 \leq 1\} is closed (the norm is continuous, so the preimage of [0,1][0,1] is closed) and bounded.

Yet BB is not compact. The standard basis vectors ene_n (a 11 in position nn, zeros elsewhere) all lie in BB, but for nmn \neq m, enem2=2\|e_n - e_m\|_2 = \sqrt{2}. No subsequence of (en)(e_n) is Cauchy, so none converges. Sequential compactness fails, and Heine-Borel is therefore an Rn\mathbb{R}^n-specific theorem, not a general metric-space fact.

In ML, this is what changes when you switch from a finite-parameter model to an infinite-dimensional function space: compactness arguments need to be rebuilt around weak topologies, equicontinuity (Arzelà-Ascoli), or explicit covering-number bounds.

Example

Why $\arg\min$ may not be a singleton or even nonempty

Take f(w)=sin(w)f(w) = \sin(w) on the closed half-line [0,)[0, \infty). The set is closed but not bounded; the minimum value 1-1 is attained at infinitely many points w=3π/2+2πkw = 3\pi/2 + 2\pi k, so argmin\arg\min is nonempty but not a singleton. Now take f(w)=ewf(w) = e^{-w} on the same domain: the infimum is 00, no minimizer exists at all, and argmin\arg\min is empty. The same loss on the compact restriction [0,R][0, R] has minimizer w=Rw = R. Compactness is exactly the condition that turns "inf\inf" into "min\min".

Common Confusions

Watch Out

Closed and bounded is not enough in infinite dimensions

A common mistake is applying Heine-Borel in function spaces. The closed unit ball in any infinite-dimensional normed space is never compact. This is why weak compactness and related notions are needed in functional analysis and measure-theoretic probability.

Watch Out

Compact subsets of R^n are always complete

Every compact metric space is complete (Cauchy sequences have convergent subsequences, whose limits are in the compact set). But completeness alone does not give compactness: R\mathbb{R} is complete but not compact.

Exercises

ExerciseCore

Problem

Is the set {xR2:x21}\{x \in \mathbb{R}^2 : \|x\|_2 \leq 1\} compact? What about {xR2:x2<1}\{x \in \mathbb{R}^2 : \|x\|_2 < 1\}?

ExerciseAdvanced

Problem

Give an example of a continuous function on a closed (but unbounded) subset of R\mathbb{R} that does not attain its infimum. Explain which hypothesis of the extreme value theorem fails.

References

Canonical:

  • Rudin, Principles of Mathematical Analysis (1976), Chapter 2 (sections on compactness)
  • Munkres, Topology (2000), Chapter 3
  • Apostol, Mathematical Analysis (1974), Chapter 3 (compact subsets of R^n)

For ML context:

  • Shalev-Shwartz & Ben-David, Understanding Machine Learning (2014), Chapter 27 (covering numbers and compactness)
  • Folland, Real Analysis (1999), Section 4.4 (compactness in metric spaces and function spaces)
  • Aliprantis & Border, Infinite Dimensional Analysis (2006), Chapters 2-3 (compactness in topological vector spaces)

Last reviewed: April 18, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

1

Derived topics

0

No published topic currently declares this as a prerequisite.