Kolmogorov Probability Axioms

Sneiderman, Robby

Foundations

Kolmogorov Probability Axioms

The three axioms (non-negativity, normalization, countable additivity) that every probability claim on this site implicitly invokes. Sample space, event sigma-algebra, probability measure, and the immediate consequences.

CoreTier 1StableCore spine~30 min

Prerequisites

Sets Functions and Relations

Quiz (29)Pulse Check Prereq Map

Why This Matters

theorem visual

A probability space is outcomes, legal events, and mass

$The picture shows two overlapping events inside the sample space, then a disjoint-union case where additivity is literal.$

probability space

$(Ω, F, P)$

$Outcomes live in Ω, legal events live in F, and P assigns mass to those events.$

axiom 1

$P (A) \geq 0$

$No event can have negative mass.$

axiom 2

$P (Ω) = 1$

$All possible outcomes together carry total mass 1.$

axiom 3

$P (⋃_{n} A_{n}) = \sum_{n} P (A_{n})$

$For disjoint events, probability adds. Countable additivity is what later powers continuity and limit theorems.$

derived fast

$P (\emptyset) = 0, P (A^{c}) = 1 - P (A)$

$Monotonicity and inclusion-exclusion follow from the same bookkeeping once the three axioms are fixed.$

Every probabilistic statement on this site, from a single-parameter Bernoulli likelihood to the convergence guarantee of stochastic gradient descent, rests on three axioms written down by Kolmogorov in 1933. The axioms do not say what probability means; they say what any consistent assignment of probabilities must satisfy. Frequentist long-run frequencies, Bayesian degrees of belief, and classical equally-likely-outcomes interpretations all produce probabilities obeying the same axioms. The interpretation is philosophical; the axioms are mathematical.

Reading this page is what lets every later result be unambiguous. When measure-theoretic probability writes "let $(\Omega, \mathcal{F}, P)$ be a probability space," this page is what that phrase means.

The phrase mixes three new objects: a , a , and a .

If you want the non-rigorous picture first, the Probability Mechanics Lab shows sample space, events, random variables, and expectation as concrete moving objects. This page is the formal version of that same board.

Quick Version

$\Omega$ . The set of all possible full outcomes. It says what could happen at all.
$\mathcal{F}$ . The collection of measurable events. It says which questions are legal to ask.
$P$ . A mass assignment on events. It says how likely each legal event is.
Countable additivity. Disjoint pieces add, even across countably many pieces. This is what gives continuity and limit theorems.

The whole setup can be compressed to one line: probability is a measure of total mass 1 living on a sigma-algebra of events.

The Three Objects

Probability requires three objects, fixed before any random variable is introduced.

Definition

Sample Space

The sample space $\Omega$ is a non-empty set whose elements $\omega \in \Omega$ are called outcomes. An outcome represents a complete specification of one possible result of the random experiment. The set $\Omega$ is the entire space of "what could happen."

Definition

Event Sigma-Algebra $F \subseteq 2^{Ω}$

A collection $\mathcal{F}$ of subsets of $\Omega$ is a sigma-algebra (or event space) when all of the following hold:

$\Omega \in \mathcal{F}$ ,
$A \in \mathcal{F} \implies A^c \in \mathcal{F}$ (closed under complements),
$A_1, A_2, \ldots \in \mathcal{F} \implies \bigcup_{n=1}^\infty A_n \in \mathcal{F}$ (closed under countable unions).

Elements of $\mathcal{F}$ are called events. Closure under complements and countable unions automatically gives closure under countable intersections, set differences, and limits.

Definition

Probability Measure $P : F \to [0, 1]$

A probability measure is a function $P : \mathcal{F} \to [0, 1]$ satisfying the three Kolmogorov axioms below. The triple $(\Omega, \mathcal{F}, P)$ is a probability space.

The reason events live in a sigma-algebra rather than $2^\Omega$ is that not every subset of an uncountable $\Omega$ can be assigned a probability consistently. The Vitali set on $[0, 1]$ has no Lebesgue measure; trying to define $P$ on it gives a contradiction. The sigma-algebra is a collection of subsets on which $P$ can be defined coherently. There can be many such collections (the Borel sigma-algebra is strictly contained in its Lebesgue completion, for instance), and the choice of sigma-algebra is part of the specification of the probability space.

The Three Axioms

Theorem

Kolmogorov Axioms of Probability

Statement

$P : \mathcal{F} \to \mathbb{R}$ is a probability measure if and only if it satisfies:

Non-negativity. $P(A) \geq 0$ for all $A \in \mathcal{F}$ .
Normalization. $P(\Omega) = 1$ .
Countable additivity. For any countable collection of pairwise disjoint events $A_1, A_2, \ldots \in \mathcal{F}$ (so $A_i \cap A_j = \emptyset$ for $i \neq j$ ), $P\!\left(\bigcup_{n=1}^\infty A_n\right) = \sum_{n=1}^\infty P(A_n).$

Intuition

Axiom 1 rules out negative probability. Axiom 2 fixes the total mass at 1 (otherwise we'd be doing measure theory, not probability theory). Axiom 3 says probability behaves like a mass: putting probability on a countable disjoint union is the same as adding the masses on each piece. The choice of countable (not just finite) additivity is what gives probability its analytic strength: it forces continuity properties used in every limit theorem.

Proof Sketch

This is a definition disguised as a theorem; there is nothing to prove. The content is in the consequences below, all of which follow from these three axioms by elementary set manipulation.

Why It Matters

Every result in probability and statistics derives from these three axioms plus the structure of the chosen $(\Omega, \mathcal{F})$ . The axioms are deliberately weak: they make no claim about how to assign probabilities to specific events, only about consistency requirements any such assignment must meet. This is what allows Bayesians, frequentists, and decision theorists to share a mathematical foundation while disagreeing on interpretation.

Failure Mode

A function satisfying only finite additivity (sums for finite disjoint unions) is a finitely additive probability, not a (countably additive) probability measure. Finitely additive probabilities exist on any algebra, but they fail the continuity properties below and break the dominated convergence theorem. Real-valued probability theory uses countable additivity because the analytic payoff (limit theorems, Lebesgue integration of expectations) is enormous. The cost is that not every subset of an uncountable $\Omega$ is an event.

report a correction →

Immediate Consequences

The next properties follow directly from the three axioms. Every later page uses them without proof.

Proposition

Basic Identities of Probability Measures

Statement

For any probability measure $P$ on $(\Omega, \mathcal{F})$ :

$P(\emptyset) = 0$ .
$P(\Omega) = 1$ .
If $A \subseteq B$ , then $P(A) \leq P(B)$ .

Intuition

The empty event carries no mass, the whole sample space carries all mass, and larger events cannot have smaller probability than their subsets. These are the first algebraic facts that let probability behave like a coherent size function on events.

Proof Sketch

The empty-set identity is part of the measure structure. Normalization is the probability-specific axiom. Monotonicity follows by writing $B = A \cup (B \setminus A)$ as a disjoint union and using non-negativity.

Why It Matters

This is the smallest reusable bridge from axioms to calculations. Union bounds, Borel-Cantelli, convergence theorems, and every probabilistic diagnostic later on assume these identities without restating them.

Failure Mode

The statement is about events in the chosen sigma-algebra. If a subset of $\Omega$ is not measurable, $P$ is not required to assign it a probability.

report a correction →

The next two claim blocks separate the existing-measure direction from the harder reconstruction theorem. In Lean, a Measure already carries empty-set mass and countable additivity; the verified claims below check that TheoremPath's page-facing statements match those mathlib facts for a probability measure.

Proposition

Finite Additivity for Probability Measures

Statement

If $A$ and $B$ are disjoint events in the same probability space, then

$P(A \cup B) = P(A) + P(B).$

Intuition

Disjoint events occupy non-overlapping pieces of the sample space, so their probability masses add without correction terms.

Proof Sketch

This is the two-set case of countable additivity. In the Lean artifact, the claim is a thin wrapper around mathlib's finite union theorem for disjoint measurable sets.

Why It Matters

Finite additivity is the step behind complement rules, inclusion-exclusion, and the two-event union bound. Most probability calculations start by splitting an event into disjoint pieces and adding their probabilities.

Failure Mode

The disjointness assumption is material. If events overlap, adding $P(A)$ and $P(B)$ counts the intersection twice; inclusion-exclusion supplies the correction.

report a correction →

Theorem

Countable Additivity for Probability Measures

Statement

For pairwise disjoint events $A_0, A_1, \ldots$ ,

$P\!\left(\bigcup_{n=0}^{\infty} A_n\right) = \sum_{n=0}^{\infty} P(A_n).$

Intuition

Countable additivity says probability mass has no hidden boundary charge at the limit of a countable disjoint decomposition. The mass of the countable union is exactly the infinite sum of the component masses.

Proof Sketch

This is part of the measure structure in mathlib. The Lean artifact wraps the countable-additivity theorem for Measure under the probability-measure normalization assumption.

Why It Matters

This is the axiom that powers monotone convergence, Borel-Cantelli, laws of large numbers, and probability bounds over countable families of events.

Failure Mode

Finite additivity alone does not imply this statement. Finitely additive probabilities can assign zero mass to each singleton in $\mathbb{N}$ while assigning mass one to the whole set, which violates the displayed countable sum identity.

report a correction →

Proposition

Complement Rule for Probability Measures

Statement

For any event $A$ in a probability space,

$P(A^c) = 1 - P(A).$

Intuition

An event and its complement split the sample space into two disjoint pieces. Since the whole sample space has mass one, whatever mass is not on $A$ is on $A^c$ .

Proof Sketch

Use finite additivity on the disjoint union $A \cup A^c = \Omega$ : $P(A) + P(A^c) = P(\Omega) = 1$ . The Lean wrapper uses mathlib's complement measure theorem with the finite-measure condition supplied by the probability measure.

Why It Matters

Complement conversion turns lower-tail bounds into upper-tail bounds and "at least one" events into "none" events. It is a small identity, but it sits inside Borel-Cantelli, hypothesis testing error control, and almost-sure arguments.

Failure Mode

The statement is only about measurable events. For a non-measurable subset, neither $P(A)$ nor $P(A^c)$ is part of the probability space.

report a correction →

Theorem

Finite Union Bound for Probability Measures

Statement

For any finite family of events $(A_i)_{i \in I}$ ,

$P\!\left(\bigcup_{i \in I} A_i\right) \leq \sum_{i \in I} P(A_i).$

Intuition

Adding individual probabilities counts overlaps more than once, so the sum is an upper bound on the probability that at least one event occurs.

Proof Sketch

The two-event case follows from inclusion-exclusion by dropping the non-negative intersection term. The finite-family case follows by induction or from measure subadditivity. The Lean wrapper uses mathlib's finite union bound for measures directly.

Why It Matters

This is the probability move behind finite hypothesis-class generalization: bound the failure probability for each hypothesis, then union-bound over the finite class.

Failure Mode

The bound can be loose when events overlap heavily. It is a worst-case one-sided bound, not an independence calculation.

report a correction →

Theorem

Countable Union Bound for Probability Measures

Statement

For events $A_0, A_1, \ldots$ ,

$P\!\left(\bigcup_{n=0}^{\infty} A_n\right) \leq \sum_{n=0}^{\infty} P(A_n).$

Intuition

Countable additivity gives equality for disjoint events. If the events overlap, assigning each point to its first covering event turns the union into a disjoint subfamily whose total mass is no larger than the original sum.

Proof Sketch

This is countable subadditivity of measures. The Lean wrapper calls mathlib's countable union-bound theorem for measures; no disjointness assumption is needed.

Why It Matters

Countable union bounds power first Borel-Cantelli, tail-event estimates, and "bad event at some time" arguments. They are the bridge from probability axioms to concentration and learning-theory sample-complexity bounds.

Failure Mode

If the sum of probabilities is larger than 1, the inequality is still true but often uninformative. Later concentration results matter because they make the right side small enough to use.

report a correction →

Probability of the empty set: $P(\emptyset) = 0$ . Proof. Apply axiom 3 to $\Omega = \Omega \cup \emptyset \cup \emptyset \cup \cdots$ to get $1 = 1 + P(\emptyset) + P(\emptyset) + \cdots$ , forcing $P(\emptyset) = 0$ .

Monotonicity: $A \subseteq B \implies P(A) \leq P(B)$ . Proof. Write $B = A \cup (B \setminus A)$ as a disjoint union, so $P(B) = P(A) + P(B \setminus A) \geq P(A)$ by axiom 1.

A useful corollary: $P(A) \in [0, 1]$ for every event $A$ . The codomain $[0, 1]$ in the definition is forced by the axioms, not assumed.

Inclusion-Exclusion

For finite unions of overlapping events, additivity needs a correction.

Theorem

Inclusion-Exclusion Principle

Statement

For events $A_1, \ldots, A_n \in \mathcal{F}$ ,

$P\!\left(\bigcup_{i=1}^n A_i\right) = \sum_{k=1}^n (-1)^{k+1} \!\!\!\!\sum_{1 \leq i_1 < \cdots < i_k \leq n} \!\!\!\! P\!\left(A_{i_1} \cap \cdots \cap A_{i_k}\right).$

For $n = 2$ : $P(A_1 \cup A_2) = P(A_1) + P(A_2) - P(A_1 \cap A_2)$ . For $n = 3$ : $P(A_1 \cup A_2 \cup A_3) = P(A_1) + P(A_2) + P(A_3) - P(A_1 \cap A_2) - P(A_1 \cap A_3) - P(A_2 \cap A_3) + P(A_1 \cap A_2 \cap A_3)$ .

Intuition

Adding $P(A_1) + P(A_2)$ double-counts the overlap, so subtract $P(A_1 \cap A_2)$ . With three sets, the three pairwise overlaps subtract too much from the triple overlap, so add it back. The alternating sign pattern generalizes this bookkeeping to any finite $n$ .

Proof Sketch

Induction on $n$ . Base case $n = 2$ follows by writing $A_1 \cup A_2 = A_1 \cup (A_2 \setminus A_1)$ as a disjoint union and using $P(A_2 \setminus A_1) = P(A_2) - P(A_1 \cap A_2)$ . Inductive step: apply $n = 2$ to $A_1 \cup \cdots \cup A_{n-1}$ and $A_n$ , distribute intersections, and collect.

Why It Matters

Inclusion-exclusion is the workhorse for computing union probabilities when you can compute intersections. It appears in derangement counts, the union bound (a one-term truncation), and inclusion-exclusion bounds in combinatorial probability. The Bonferroni inequalities are obtained by truncating the alternating sum after an even or odd number of terms, yielding lower or upper bounds.

Failure Mode

The number of terms grows as $2^n - 1$ . For large $n$ , computing every intersection probability is infeasible, and inclusion-exclusion becomes a theoretical tool rather than a computational one. The union bound $P(\bigcup_i A_i) \leq \sum_i P(A_i)$ is the cheap one-sided alternative used throughout learning theory.

report a correction →

Continuity of Probability

Countable additivity is equivalent (given finite additivity and non-negativity) to a continuity property: probabilities respect monotone limits of events.

Theorem

Continuity of Probability Measures

Statement

$P$ is countably additive (and hence a probability measure) if and only if both of the following hold:

Continuity from below. For any increasing sequence $A_1 \subseteq A_2 \subseteq \cdots$ with $A_n \in \mathcal{F}$ ,

$P\!\left(\bigcup_{n=1}^\infty A_n\right) = \lim_{n \to \infty} P(A_n).$

Continuity from above. For any decreasing sequence $B_1 \supseteq B_2 \supseteq \cdots$ with $B_n \in \mathcal{F}$ ,

$P\!\left(\bigcap_{n=1}^\infty B_n\right) = \lim_{n \to \infty} P(B_n).$

Intuition

"Increasing union" means the events grow to fill out their limit; the probabilities should grow to fill out the limit's probability. Without continuity, a sequence of events could grow to include "more and more" of $\Omega$ while their probabilities stayed pinned below the union's probability, which would break every limit argument in probability.

Proof Sketch

Decompose the increasing union as a countable disjoint union: $A_\infty = A_1 \sqcup (A_2 \setminus A_1) \sqcup (A_3 \setminus A_2) \sqcup \cdots$ . Apply countable additivity: $P(A_\infty) = P(A_1) + \sum_n P(A_{n+1} \setminus A_n)$ . The partial sums telescope to $P(A_n)$ , so the limit equals $P(A_\infty)$ . Continuity from above follows by passing to complements.

Why It Matters

This is what countable additivity buys you. Every " $P(\text{eventually}) = \lim P(A_n)$ " argument, every interchange of limit and probability, every dominated convergence application for indicator functions, sits on this continuity. The Borel-Cantelli lemmas and the modes of convergence of random variables both rely on it.

Failure Mode

Continuity from above requires the events to be decreasing and at least one of them to have finite measure. For probability measures this is automatic (everything has measure at most 1), but for general measures (like Lebesgue measure on $\mathbb{R}$ ) the assumption is necessary. Example: the sets $B_n = [n, \infty)$ decrease to $\emptyset$ but each has infinite Lebesgue measure.

report a correction →

The two one-direction consequences below are the exact claim-level split used by the internal verification system. The broader equivalence above remains a source-review target; these smaller claims are the pieces Lean can check without pretending to prove the whole characterization theorem.

Theorem

Continuity from Below for Probability Measures

Statement

If $A_0 \subseteq A_1 \subseteq \cdots$ is an increasing sequence of events, then the sequence $P(A_n)$ tends to the probability of the union:

$P(A_n) \to P\!\left(\bigcup_{n=0}^\infty A_n\right).$

Intuition

As the events grow, no probability mass disappears at the limit. Countable additivity lets the measure see the countably many incremental shells $A_{n+1} \setminus A_n$ .

Proof Sketch

This is the monotone-continuity theorem for measures. Write the increasing union as the disjoint union of the first set and the successive differences; countable additivity turns the measure of the limit event into the limit of the partial sums.

Why It Matters

This is the exact form used in stopping-time, convergence, and Borel-Cantelli arguments: replace a growing event by its finite prefixes, then pass to the limit.

Failure Mode

The monotonicity assumption is material. For arbitrary events, $P(A_n)$ can oscillate while the union stays fixed, so there is no reason for the displayed limit to hold.

report a correction →

Theorem

Continuity from Above for Probability Measures

Statement

If $B_0 \supseteq B_1 \supseteq \cdots$ is a decreasing sequence of events, then the sequence $P(B_n)$ tends to the probability of the intersection:

$P(B_n) \to P\!\left(\bigcap_{n=0}^\infty B_n\right).$

Intuition

Decreasing events shave away mass. In a probability space the starting mass is finite, so the mass left after countably many removals is the limit of the remaining masses.

Proof Sketch

Apply continuity from below to the complements $B_n^c$ , which form an increasing sequence, and use the complement rule. Equivalently, use the measure-continuity-from-above theorem with the finite-measure condition supplied automatically by $P(\Omega)=1$ .

Why It Matters

This is the form behind "eventually always" events, tail intersections, and limit arguments where bad sets shrink as a parameter grows.

Failure Mode

For general infinite measures, one finite-measure assumption is necessary. The sets $[n,\infty)$ in Lebesgue measure decrease to the empty set but all have infinite measure, so the limit statement fails in the extended-real sense.

report a correction →

Why Countable, Not Finite, Additivity

Finite additivity is the version most people guess when first writing down probability axioms. Why does the standard formulation insist on countable additivity?

Three reasons:

Limit theorems. Without continuity from below, the law of large numbers cannot be stated as a single statement about a sequence of averages. The law of large numbers and the central limit theorem both produce countable disjoint unions in their proofs.
Lebesgue integration. The expected value $\mathbb{E}[X] = \int X \, dP$ is the Lebesgue integral against $P$ . Lebesgue's monotone and dominated convergence theorems require countable additivity. Without them, you cannot interchange $\lim$ and $\mathbb{E}$ , which is required in nearly every consistency proof.
Probability of unions of events with shrinking probability. Countable additivity is what guarantees that countably many "rare events" (each with small probability) cannot collectively sum to more than 1. This is the engine of the common inequalities used as union bounds throughout learning theory.

The price is the existence of non-measurable sets. On uncountable $\Omega$ , not every subset is in $\mathcal{F}$ . For all of probability and statistics, this is a fair trade.

Common Confusions

Watch Out

Probability zero is not the same as impossible

For a continuous random variable $X$ with density $f_X$ , $P(X = x) = 0$ for every fixed $x$ , yet $X$ takes some value with probability 1. "Probability zero" means the event has measure zero, not that it cannot happen. Symmetric: "probability one" (almost sure) does not mean "always," only that the exceptional set has measure zero.

Watch Out

Sigma-algebras are not optional bookkeeping

Many introductory treatments hide the sigma-algebra to keep notation light, writing $P(\text{anything})$ as if every subset were an event. This works on discrete or finite $\Omega$ , where you can take $\mathcal{F} = 2^\Omega$ . On $\mathbb{R}$ or $\mathbb{R}^d$ , the standard choices are the Borel sigma-algebra generated by open sets and its Lebesgue completion. Both exclude pathological sets like the Vitali construction. Pretending $\mathcal{F} = 2^\Omega$ on $\mathbb{R}$ is what causes Banach-Tarski-style paradoxes when you try to define a uniform probability on $[0, 1]$ .

Watch Out

The axioms do not pick an interpretation

The axioms tell you what arithmetic probabilities must obey. They do not tell you whether $P(\text{coin lands heads}) = 0.5$ means a long-run frequency, a betting rate, or a degree of belief. Frequentists, Bayesians, and subjectivists all use the same Kolmogorov axioms; they differ on what the numbers refer to. The mathematics is consistent across interpretations because the axioms are interpretation-free.

Summary

A probability space is a triple $(\Omega, \mathcal{F}, P)$ : a sample space, an event sigma-algebra, and a probability measure.
The three axioms are non-negativity, normalization, and countable additivity.
Immediate consequences: $P(\emptyset) = 0$ , complement rule, monotonicity, finite additivity.
Inclusion-exclusion handles finite unions of overlapping events; the union bound is its one-sided cheap relative.
Countable additivity is equivalent to continuity of probability for monotone sequences of events; this continuity is what makes limit theorems possible.
The axioms are silent on interpretation: frequentist, Bayesian, and classical accounts of probability all satisfy them.

Exercises

ExerciseCore

Problem

Let $(\Omega, \mathcal{F}, P)$ be a probability space and $A, B \in \mathcal{F}$ . Prove that $P(A \cup B) \leq P(A) + P(B)$ (the two-event union bound), with equality if and only if $A \cap B$ has probability zero.

ExerciseAdvanced

Problem

Construct a finitely additive probability on $\mathbb{N}$ that is not countably additive, by assigning $P(\{n\}) = 0$ for every singleton $n$ but $P(\mathbb{N}) = 1$ . (Such a $P$ exists, by appeal to the Hahn-Banach theorem or an ultrafilter on $\mathbb{N}$ .) Then explain which Kolmogorov axiom this violates and why it cannot be a probability measure in the standard sense.

References

Original:

Kolmogorov, "Grundbegriffe der Wahrscheinlichkeitsrechnung" (Springer, 1933); English translation "Foundations of the Theory of Probability" (Chelsea, 1956), Chapter 1

Standard graduate texts:

Billingsley, "Probability and Measure" (3rd edition, Wiley, 1995), Sections 2-3
Durrett, "Probability: Theory and Examples" (5th edition, Cambridge, 2019), Section 1.1
Williams, "Probability with Martingales" (Cambridge, 1991), Chapter 1
Resnick, "A Probability Path" (Birkhauser, 1999), Chapters 1-2

Real analysis perspective:

Folland, "Real Analysis: Modern Techniques and Their Applications" (2nd edition, Wiley, 1999), Chapter 1
Rudin, "Real and Complex Analysis" (3rd edition, McGraw-Hill, 1987), Chapter 1

Next Topics

Measure-theoretic probability: building expectation and Lebesgue integration on top of the axioms
Joint, marginal, conditional distributions: the working notation that the axioms support
Common inequalities: Markov, Chebyshev, Jensen, and the union bound

Last reviewed: April 28, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Sets, Functions, and Relationslayer 0A · tier 1

Derived topics

5

Common Inequalitieslayer 0A · tier 1
Joint, Marginal, and Conditional Distributionslayer 0A · tier 1
Random Variableslayer 0A · tier 1
Measure-Theoretic Probabilitylayer 0B · tier 1
Time Series Foundationslayer 2 · tier 2

Graph-backed continuations

Measure-Theoretic Probability Joint, Marginal, and Conditional Distributions Common Inequalities Random Variables Time Series Foundations