Compactness and Heine-Borel

Sneiderman, Robby

Foundations

Compactness and Heine-Borel

Sequential compactness, the Heine-Borel theorem in finite dimensions, the extreme value theorem, and why compactness is the key assumption in optimization.

CoreTier 1StableSupporting~40 min

Prerequisites

Metric Spaces Convergence Completeness

Quiz (15)Pulse Check Prereq Map

Why This Matters

Optimization problems in ML ask: does a minimum exist? Compactness is the standard tool for guaranteeing existence of minimizers. The extreme value theorem says a continuous function on a compact set attains its minimum. Without compactness, minima may not exist (the is not achieved), and many proof strategies in learning theory break down.

This page leans on three pieces of notation: , , and the idea of a .

Core Definitions

Definition

Open Cover

An open cover of a set $K \subseteq X$ is a collection of open sets $\{U_\alpha\}_{\alpha \in I}$ such that $K \subseteq \bigcup_{\alpha \in I} U_\alpha$ . A subcover is a subcollection that still covers $K$ .

Definition

Compactness

A subset $K$ of a metric space is compact if and only if every open cover of $K$ has a finite subcover. Compactness is a topological property: it does not depend on the particular metric, only on the topology.

Definition

Sequential Compactness

A set $K$ is sequentially compact if and only if every sequence in $K$ has a subsequence that converges to a point in $K$ . In metric spaces, compactness and sequential compactness are equivalent.

Definition

Bounded Set

A subset $S$ of a metric space $(X, d)$ is bounded if and only if there exist $x_0 \in X$ and $M > 0$ such that $d(x, x_0) \leq M$ for all $x \in S$ . In $\mathbb{R}^n$ , this means $S$ is contained in some ball of finite radius.

Main Theorems

Theorem

Heine-Borel Theorem

Statement

A subset $K \subseteq \mathbb{R}^n$ is compact if and only if $K$ is closed and bounded.

Intuition

Bounded means sequences cannot escape to infinity. Closed means limits of convergent subsequences stay in the set. Together, they guarantee every sequence has a convergent subsequence with limit in $K$ .

Proof Sketch

Compact implies closed and bounded: If $K$ is not bounded, the sequence $x_n$ with $\|x_n\| \to \infty$ has no convergent subsequence in $K$ . If $K$ is not closed, there is a limit point $x \notin K$ ; a sequence converging to $x$ has no subsequence converging in $K$ . Closed and bounded implies compact: By Bolzano-Weierstrass, every bounded sequence in $\mathbb{R}^n$ has a convergent subsequence. Since $K$ is closed, the limit is in $K$ .

Why It Matters

Heine-Borel makes compactness easy to check in $\mathbb{R}^n$ : just verify closed and bounded. This is the standard way to establish that a constrained optimization problem has a solution.

Failure Mode

Heine-Borel fails in infinite-dimensional spaces. The closed unit ball in $\ell^2$ is closed and bounded but not compact (the standard basis vectors $e_1, e_2, \ldots$ have no convergent subsequence). In infinite dimensions, compactness is a much stronger condition than closed-and-bounded.

report a correction →

Theorem

Extreme Value Theorem

Statement

A continuous function $f: K \to \mathbb{R}$ on a nonempty compact set $K$ attains its maximum and minimum. There exist $x_{\min}, x_{\max} \in K$ with:

$f(x_{\min}) \leq f(x) \leq f(x_{\max}) \quad \text{for all } x \in K$

Intuition

Compactness prevents the minimizing sequence from escaping (either to infinity or to a boundary point outside $K$ ). Continuity preserves convergence, so the limit of a minimizing sequence is a minimizer.

Proof Sketch

Let $m = \inf_{x \in K} f(x)$ . Take a sequence $(x_n)$ with $f(x_n) \to m$ . By compactness, a subsequence $x_{n_k} \to x^* \in K$ . By continuity, $f(x^*) = \lim f(x_{n_k}) = m$ . So the infimum is attained at $x^*$ . Same argument for the supremum.

Why It Matters

This theorem is invoked implicitly whenever you write $\min_{w \in W} L(w)$ for a continuous loss $L$ over a compact parameter set $W$ . Without compactness, you can only write $\inf$ , and the minimizer may not exist.

Failure Mode

Fails without compactness: $f(x) = e^{-x}$ on $(0, \infty)$ has $\inf f = 0$ but no minimizer.

Fails without continuity: define $f: [0, 1] \to \mathbb{R}$ by $f(x) = x$ for $x \in (0, 1]$ and $f(0) = 1$ . The domain $[0, 1]$ is compact, but $f$ is discontinuous at $0$ (the right limit is $0$ while $f(0) = 1$ ). The infimum is $\inf f = 0$ , but no minimizer exists: every $x \in (0, 1]$ gives $f(x) = x > 0$ , and $f(0) = 1 \neq 0$ . Continuity is the missing hypothesis, not compactness.

report a correction →

Compact ⟺ Complete + Totally Bounded

In any metric space the following are equivalent for a set $K$ :

$K$ is compact.
$K$ is sequentially compact: every sequence in $K$ has a subsequence converging to a point of $K$ .
$K$ is complete and totally bounded.

Total boundedness means: for every $\varepsilon > 0$ , $K$ can be covered by finitely many open balls of radius $\varepsilon$ . The minimum number of such balls is the covering number $N(K, \varepsilon)$ . This is the analytic origin of covering-number arguments in learning theory and high-dimensional probability, and it is the route by which compactness enters Rademacher complexity bounds via Dudley's chaining inequality.

This equivalence makes Heine-Borel feel less special. In $\mathbb{R}^n$ , "closed" means complete and "bounded" means totally bounded, so closed + bounded is exactly compactness. In an infinite-dimensional normed space, bounded does not imply totally bounded — the closed unit ball in $\ell^2$ is bounded but contains the orthonormal basis $\{e_n\}$ with $\|e_n - e_m\|_2 = \sqrt{2}$ for $n \ne m$ , so no finite collection of $\varepsilon$ -balls with $\varepsilon < 1/\sqrt{2}$ covers it — which is why closed + bounded no longer implies compact in infinite dimensions.

Why Compactness Matters in ML

Compactness is one clean sufficient condition for several ML-relevant theorems, not the universal assumption. Be precise about what each theorem actually needs.

Existence of minimizers via EVT. A continuous loss on a compact domain attains its minimum (extreme value theorem). A hard parameter constraint $\{w : \|w\| \le R\}$ is closed and bounded in $\mathbb{R}^n$ , hence compact, hence the existence proof goes through directly. Ordinary $L_2$ regularization $\min_w L(w) + \lambda \|w\|^2$ on an unbounded domain is a different argument: coercivity ( $f(w) \to \infty$ as $\|w\| \to \infty$ ) plus lower semicontinuity (or strict convexity) gives existence of a minimizer over $\mathbb{R}^n$ without literal compactness.
Covering-number arguments. What is needed is total boundedness of the relevant hypothesis class at the relevant scale, not full compactness. Compactness implies total boundedness (the bridge theorem above), but many learning-theory bounds work under weaker entropy assumptions on the class.
Continuity arguments. Compactness can upgrade pointwise to uniform statements only with extra hypotheses. Dini's theorem, for example, requires monotone convergence of continuous functions to a continuous limit on a compact space.
Function-approximation theory. The Arzelà-Ascoli theorem characterizes relatively compact families in $C(K)$ via uniform boundedness and equicontinuity. Universal approximation theorems are usually stated on compact domains for clean topology, but Arzelà-Ascoli is a separate compactness principle, not the engine of basic universal approximation.

Examples

Example

The closed unit ball in $\ell^2$ is closed and bounded but not compact

Let $\ell^2$ be the space of square-summable real sequences with norm $\|x\|_2 = \sqrt{\sum_n x_n^2}$ . The closed unit ball $B = \{x \in \ell^2 : \|x\|_2 \leq 1\}$ is closed (the norm is continuous, so the preimage of $[0,1]$ is closed) and bounded.

Yet $B$ is not compact. The standard basis vectors $e_n$ (a $1$ in position $n$ , zeros elsewhere) all lie in $B$ , but for $n \neq m$ , $\|e_n - e_m\|_2 = \sqrt{2}$ . No subsequence of $(e_n)$ is Cauchy, so none converges. Sequential compactness fails, and Heine-Borel is therefore an $\mathbb{R}^n$ -specific theorem, not a general metric-space fact.

In ML, this is what changes when you switch from a finite-parameter model to an infinite-dimensional function space: compactness arguments need to be rebuilt around weak topologies, equicontinuity (Arzelà-Ascoli), or explicit covering-number bounds.

Example

Why $\arg\min$ may not be a singleton or even nonempty

Take $f(w) = \sin(w)$ on the closed half-line $[0, \infty)$ . The set is closed but not bounded; the minimum value $-1$ is attained at infinitely many points $w = 3\pi/2 + 2\pi k$ , so $\arg\min$ is nonempty but not a singleton. Now take $f(w) = e^{-w}$ on the same domain: the infimum is $0$ , no minimizer exists at all, and $\arg\min$ is empty. The same loss on the compact restriction $[0, R]$ has minimizer $w = R$ . Compactness is exactly the condition that turns " $\inf$ " into " $\min$ ".

Common Confusions

Watch Out

Closed and bounded is not enough in infinite dimensions

A common mistake is applying Heine-Borel in function spaces. The closed unit ball in any infinite-dimensional normed space is never compact. This is why weak compactness and related notions are needed in functional analysis and measure-theoretic probability.

Watch Out

Compact subsets of R^n are always complete

Every compact metric space is complete (Cauchy sequences have convergent subsequences, whose limits are in the compact set). But completeness alone does not give compactness: $\mathbb{R}$ is complete but not compact.

Exercises

ExerciseCore

Problem

Is the set $\{x \in \mathbb{R}^2 : \|x\|_2 \leq 1\}$ compact? What about $\{x \in \mathbb{R}^2 : \|x\|_2 < 1\}$ ?

ExerciseAdvanced

Problem

Give an example of a continuous function on a closed (but unbounded) subset of $\mathbb{R}$ that does not attain its infimum. Explain which hypothesis of the extreme value theorem fails.

References

Canonical:

Rudin, Principles of Mathematical Analysis (1976), Chapter 2 (sections on compactness)
Munkres, Topology (2000), Chapter 3
Apostol, Mathematical Analysis (1974), Chapter 3 (compact subsets of R^n)

For ML context:

Shalev-Shwartz & Ben-David, Understanding Machine Learning (2014), Chapter 27 (covering numbers and compactness)
Folland, Real Analysis (1999), Section 4.4 (compactness in metric spaces and function spaces)
Aliprantis & Border, Infinite Dimensional Analysis (2006), Chapters 2-3 (compactness in topological vector spaces)

Last reviewed: April 18, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Metric Spaces, Convergence, and Completenesslayer 0A · tier 1

Derived topics

0

No published topic currently declares this as a prerequisite.