Foundations
Kolmogorov Probability Axioms
The three axioms (non-negativity, normalization, countable additivity) that every probability claim on this site implicitly invokes. Sample space, event sigma-algebra, probability measure, and the immediate consequences.
Prerequisites
Why This Matters
Every probabilistic statement on this site, from a single-parameter Bernoulli likelihood to the convergence guarantee of stochastic gradient descent, rests on three axioms written down by Kolmogorov in 1933. The axioms do not say what probability means; they say what any consistent assignment of probabilities must satisfy. Frequentist long-run frequencies, Bayesian degrees of belief, and classical equally-likely-outcomes interpretations all produce probabilities obeying the same axioms. The interpretation is philosophical; the axioms are mathematical.
Reading this page is what lets every later result be unambiguous. When measure-theoretic probability writes "let be a probability space," this page is what that phrase means.
The phrase mixes three new objects: a , a , and a .
If you want the non-rigorous picture first, the Probability Mechanics Lab shows sample space, events, random variables, and expectation as concrete moving objects. This page is the formal version of that same board.
Quick Version
- . The set of all possible full outcomes. It says what could happen at all.
- . The collection of measurable events. It says which questions are legal to ask.
- . A mass assignment on events. It says how likely each legal event is.
- Countable additivity. Disjoint pieces add, even across countably many pieces. This is what gives continuity and limit theorems.
The whole setup can be compressed to one line: probability is a measure of total mass 1 living on a sigma-algebra of events.
The Three Objects
Probability requires three objects, fixed before any random variable is introduced.
Sample Space
The sample space is a non-empty set whose elements are called outcomes. An outcome represents a complete specification of one possible result of the random experiment. The set is the entire space of "what could happen."
Event Sigma-Algebra
A collection of subsets of is a sigma-algebra (or event space) when all of the following hold:
- ,
- (closed under complements),
- (closed under countable unions).
Elements of are called events. Closure under complements and countable unions automatically gives closure under countable intersections, set differences, and limits.
Probability Measure
A probability measure is a function satisfying the three Kolmogorov axioms below. The triple is a probability space.
The reason events live in a sigma-algebra rather than is that not every subset of an uncountable can be assigned a probability consistently. The Vitali set on has no Lebesgue measure; trying to define on it gives a contradiction. The sigma-algebra is a collection of subsets on which can be defined coherently. There can be many such collections (the Borel sigma-algebra is strictly contained in its Lebesgue completion, for instance), and the choice of sigma-algebra is part of the specification of the probability space.
The Three Axioms
Kolmogorov Axioms of Probability
Statement
is a probability measure if and only if it satisfies:
- Non-negativity. for all .
- Normalization. .
- Countable additivity. For any countable collection of pairwise disjoint events (so for ),
Intuition
Axiom 1 rules out negative probability. Axiom 2 fixes the total mass at 1 (otherwise we'd be doing measure theory, not probability theory). Axiom 3 says probability behaves like a mass: putting probability on a countable disjoint union is the same as adding the masses on each piece. The choice of countable (not just finite) additivity is what gives probability its analytic strength: it forces continuity properties used in every limit theorem.
Proof Sketch
This is a definition disguised as a theorem; there is nothing to prove. The content is in the consequences below, all of which follow from these three axioms by elementary set manipulation.
Why It Matters
Every result in probability and statistics derives from these three axioms plus the structure of the chosen . The axioms are deliberately weak: they make no claim about how to assign probabilities to specific events, only about consistency requirements any such assignment must meet. This is what allows Bayesians, frequentists, and decision theorists to share a mathematical foundation while disagreeing on interpretation.
Failure Mode
A function satisfying only finite additivity (sums for finite disjoint unions) is a finitely additive probability, not a (countably additive) probability measure. Finitely additive probabilities exist on any algebra, but they fail the continuity properties below and break the dominated convergence theorem. Real-valued probability theory uses countable additivity because the analytic payoff (limit theorems, Lebesgue integration of expectations) is enormous. The cost is that not every subset of an uncountable is an event.
Immediate Consequences
The next properties follow directly from the three axioms. Every later page uses them without proof.
Basic Identities of Probability Measures
Statement
For any probability measure on :
- .
- .
- If , then .
Intuition
The empty event carries no mass, the whole sample space carries all mass, and larger events cannot have smaller probability than their subsets. These are the first algebraic facts that let probability behave like a coherent size function on events.
Proof Sketch
The empty-set identity is part of the measure structure. Normalization is the probability-specific axiom. Monotonicity follows by writing as a disjoint union and using non-negativity.
Why It Matters
This is the smallest reusable bridge from axioms to calculations. Union bounds, Borel-Cantelli, convergence theorems, and every probabilistic diagnostic later on assume these identities without restating them.
Failure Mode
The statement is about events in the chosen sigma-algebra. If a subset of is not measurable, is not required to assign it a probability.
The next two claim blocks separate the existing-measure direction from the
harder reconstruction theorem. In Lean, a Measure already carries
empty-set mass and countable additivity; the verified claims below check that
TheoremPath's page-facing statements match those mathlib facts for a
probability measure.
Finite Additivity for Probability Measures
Statement
If and are disjoint events in the same probability space, then
Intuition
Disjoint events occupy non-overlapping pieces of the sample space, so their probability masses add without correction terms.
Proof Sketch
This is the two-set case of countable additivity. In the Lean artifact, the claim is a thin wrapper around mathlib's finite union theorem for disjoint measurable sets.
Why It Matters
Finite additivity is the step behind complement rules, inclusion-exclusion, and the two-event union bound. Most probability calculations start by splitting an event into disjoint pieces and adding their probabilities.
Failure Mode
The disjointness assumption is material. If events overlap, adding and counts the intersection twice; inclusion-exclusion supplies the correction.
Countable Additivity for Probability Measures
Statement
For pairwise disjoint events ,
Intuition
Countable additivity says probability mass has no hidden boundary charge at the limit of a countable disjoint decomposition. The mass of the countable union is exactly the infinite sum of the component masses.
Proof Sketch
This is part of the measure structure in mathlib. The Lean artifact wraps the
countable-additivity theorem for Measure under the probability-measure
normalization assumption.
Why It Matters
This is the axiom that powers monotone convergence, Borel-Cantelli, laws of large numbers, and probability bounds over countable families of events.
Failure Mode
Finite additivity alone does not imply this statement. Finitely additive probabilities can assign zero mass to each singleton in while assigning mass one to the whole set, which violates the displayed countable sum identity.
Complement Rule for Probability Measures
Statement
For any event in a probability space,
Intuition
An event and its complement split the sample space into two disjoint pieces. Since the whole sample space has mass one, whatever mass is not on is on .
Proof Sketch
Use finite additivity on the disjoint union : . The Lean wrapper uses mathlib's complement measure theorem with the finite-measure condition supplied by the probability measure.
Why It Matters
Complement conversion turns lower-tail bounds into upper-tail bounds and "at least one" events into "none" events. It is a small identity, but it sits inside Borel-Cantelli, hypothesis testing error control, and almost-sure arguments.
Failure Mode
The statement is only about measurable events. For a non-measurable subset, neither nor is part of the probability space.
Finite Union Bound for Probability Measures
Statement
For any finite family of events ,
Intuition
Adding individual probabilities counts overlaps more than once, so the sum is an upper bound on the probability that at least one event occurs.
Proof Sketch
The two-event case follows from inclusion-exclusion by dropping the non-negative intersection term. The finite-family case follows by induction or from measure subadditivity. The Lean wrapper uses mathlib's finite union bound for measures directly.
Why It Matters
This is the probability move behind finite hypothesis-class generalization: bound the failure probability for each hypothesis, then union-bound over the finite class.
Failure Mode
The bound can be loose when events overlap heavily. It is a worst-case one-sided bound, not an independence calculation.
Countable Union Bound for Probability Measures
Statement
For events ,
Intuition
Countable additivity gives equality for disjoint events. If the events overlap, assigning each point to its first covering event turns the union into a disjoint subfamily whose total mass is no larger than the original sum.
Proof Sketch
This is countable subadditivity of measures. The Lean wrapper calls mathlib's countable union-bound theorem for measures; no disjointness assumption is needed.
Why It Matters
Countable union bounds power first Borel-Cantelli, tail-event estimates, and "bad event at some time" arguments. They are the bridge from probability axioms to concentration and learning-theory sample-complexity bounds.
Failure Mode
If the sum of probabilities is larger than 1, the inequality is still true but often uninformative. Later concentration results matter because they make the right side small enough to use.
Probability of the empty set: . Proof. Apply axiom 3 to to get , forcing .
Monotonicity: . Proof. Write as a disjoint union, so by axiom 1.
A useful corollary: for every event . The codomain in the definition is forced by the axioms, not assumed.
Inclusion-Exclusion
For finite unions of overlapping events, additivity needs a correction.
Inclusion-Exclusion Principle
Statement
For events ,
For : . For : .
Intuition
Adding double-counts the overlap, so subtract . With three sets, the three pairwise overlaps subtract too much from the triple overlap, so add it back. The alternating sign pattern generalizes this bookkeeping to any finite .
Proof Sketch
Induction on . Base case follows by writing as a disjoint union and using . Inductive step: apply to and , distribute intersections, and collect.
Why It Matters
Inclusion-exclusion is the workhorse for computing union probabilities when you can compute intersections. It appears in derangement counts, the union bound (a one-term truncation), and inclusion-exclusion bounds in combinatorial probability. The Bonferroni inequalities are obtained by truncating the alternating sum after an even or odd number of terms, yielding lower or upper bounds.
Failure Mode
The number of terms grows as . For large , computing every intersection probability is infeasible, and inclusion-exclusion becomes a theoretical tool rather than a computational one. The union bound is the cheap one-sided alternative used throughout learning theory.
Continuity of Probability
Countable additivity is equivalent (given finite additivity and non-negativity) to a continuity property: probabilities respect monotone limits of events.
Continuity of Probability Measures
Statement
is countably additive (and hence a probability measure) if and only if both of the following hold:
Continuity from below. For any increasing sequence with ,
Continuity from above. For any decreasing sequence with ,
Intuition
"Increasing union" means the events grow to fill out their limit; the probabilities should grow to fill out the limit's probability. Without continuity, a sequence of events could grow to include "more and more" of while their probabilities stayed pinned below the union's probability, which would break every limit argument in probability.
Proof Sketch
Decompose the increasing union as a countable disjoint union: . Apply countable additivity: . The partial sums telescope to , so the limit equals . Continuity from above follows by passing to complements.
Why It Matters
This is what countable additivity buys you. Every "" argument, every interchange of limit and probability, every dominated convergence application for indicator functions, sits on this continuity. The Borel-Cantelli lemmas and the modes of convergence of random variables both rely on it.
Failure Mode
Continuity from above requires the events to be decreasing and at least one of them to have finite measure. For probability measures this is automatic (everything has measure at most 1), but for general measures (like Lebesgue measure on ) the assumption is necessary. Example: the sets decrease to but each has infinite Lebesgue measure.
The two one-direction consequences below are the exact claim-level split used by the internal verification system. The broader equivalence above remains a source-review target; these smaller claims are the pieces Lean can check without pretending to prove the whole characterization theorem.
Continuity from Below for Probability Measures
Statement
If is an increasing sequence of events, then the sequence tends to the probability of the union:
Intuition
As the events grow, no probability mass disappears at the limit. Countable additivity lets the measure see the countably many incremental shells .
Proof Sketch
This is the monotone-continuity theorem for measures. Write the increasing union as the disjoint union of the first set and the successive differences; countable additivity turns the measure of the limit event into the limit of the partial sums.
Why It Matters
This is the exact form used in stopping-time, convergence, and Borel-Cantelli arguments: replace a growing event by its finite prefixes, then pass to the limit.
Failure Mode
The monotonicity assumption is material. For arbitrary events, can oscillate while the union stays fixed, so there is no reason for the displayed limit to hold.
Continuity from Above for Probability Measures
Statement
If is a decreasing sequence of events, then the sequence tends to the probability of the intersection:
Intuition
Decreasing events shave away mass. In a probability space the starting mass is finite, so the mass left after countably many removals is the limit of the remaining masses.
Proof Sketch
Apply continuity from below to the complements , which form an increasing sequence, and use the complement rule. Equivalently, use the measure-continuity-from-above theorem with the finite-measure condition supplied automatically by .
Why It Matters
This is the form behind "eventually always" events, tail intersections, and limit arguments where bad sets shrink as a parameter grows.
Failure Mode
For general infinite measures, one finite-measure assumption is necessary. The sets in Lebesgue measure decrease to the empty set but all have infinite measure, so the limit statement fails in the extended-real sense.
Why Countable, Not Finite, Additivity
Finite additivity is the version most people guess when first writing down probability axioms. Why does the standard formulation insist on countable additivity?
Three reasons:
-
Limit theorems. Without continuity from below, the law of large numbers cannot be stated as a single statement about a sequence of averages. The law of large numbers and the central limit theorem both produce countable disjoint unions in their proofs.
-
Lebesgue integration. The expected value is the Lebesgue integral against . Lebesgue's monotone and dominated convergence theorems require countable additivity. Without them, you cannot interchange and , which is required in nearly every consistency proof.
-
Probability of unions of events with shrinking probability. Countable additivity is what guarantees that countably many "rare events" (each with small probability) cannot collectively sum to more than 1. This is the engine of the common inequalities used as union bounds throughout learning theory.
The price is the existence of non-measurable sets. On uncountable , not every subset is in . For all of probability and statistics, this is a fair trade.
Common Confusions
Probability zero is not the same as impossible
For a continuous random variable with density , for every fixed , yet takes some value with probability 1. "Probability zero" means the event has measure zero, not that it cannot happen. Symmetric: "probability one" (almost sure) does not mean "always," only that the exceptional set has measure zero.
Sigma-algebras are not optional bookkeeping
Many introductory treatments hide the sigma-algebra to keep notation light, writing as if every subset were an event. This works on discrete or finite , where you can take . On or , the standard choices are the Borel sigma-algebra generated by open sets and its Lebesgue completion. Both exclude pathological sets like the Vitali construction. Pretending on is what causes Banach-Tarski-style paradoxes when you try to define a uniform probability on .
The axioms do not pick an interpretation
The axioms tell you what arithmetic probabilities must obey. They do not tell you whether means a long-run frequency, a betting rate, or a degree of belief. Frequentists, Bayesians, and subjectivists all use the same Kolmogorov axioms; they differ on what the numbers refer to. The mathematics is consistent across interpretations because the axioms are interpretation-free.
Summary
- A probability space is a triple : a sample space, an event sigma-algebra, and a probability measure.
- The three axioms are non-negativity, normalization, and countable additivity.
- Immediate consequences: , complement rule, monotonicity, finite additivity.
- Inclusion-exclusion handles finite unions of overlapping events; the union bound is its one-sided cheap relative.
- Countable additivity is equivalent to continuity of probability for monotone sequences of events; this continuity is what makes limit theorems possible.
- The axioms are silent on interpretation: frequentist, Bayesian, and classical accounts of probability all satisfy them.
Exercises
Problem
Let be a probability space and . Prove that (the two-event union bound), with equality if and only if has probability zero.
Problem
Construct a finitely additive probability on that is not countably additive, by assigning for every singleton but . (Such a exists, by appeal to the Hahn-Banach theorem or an ultrafilter on .) Then explain which Kolmogorov axiom this violates and why it cannot be a probability measure in the standard sense.
References
Original:
- Kolmogorov, "Grundbegriffe der Wahrscheinlichkeitsrechnung" (Springer, 1933); English translation "Foundations of the Theory of Probability" (Chelsea, 1956), Chapter 1
Standard graduate texts:
- Billingsley, "Probability and Measure" (3rd edition, Wiley, 1995), Sections 2-3
- Durrett, "Probability: Theory and Examples" (5th edition, Cambridge, 2019), Section 1.1
- Williams, "Probability with Martingales" (Cambridge, 1991), Chapter 1
- Resnick, "A Probability Path" (Birkhauser, 1999), Chapters 1-2
Real analysis perspective:
- Folland, "Real Analysis: Modern Techniques and Their Applications" (2nd edition, Wiley, 1999), Chapter 1
- Rudin, "Real and Complex Analysis" (3rd edition, McGraw-Hill, 1987), Chapter 1
Next Topics
- Measure-theoretic probability: building expectation and Lebesgue integration on top of the axioms
- Joint, marginal, conditional distributions: the working notation that the axioms support
- Common inequalities: Markov, Chebyshev, Jensen, and the union bound
Last reviewed: April 28, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
1- Sets, Functions, and Relationslayer 0A · tier 1
Derived topics
5- Common Inequalitieslayer 0A · tier 1
- Joint, Marginal, and Conditional Distributionslayer 0A · tier 1
- Random Variableslayer 0A · tier 1
- Measure-Theoretic Probabilitylayer 0B · tier 1
- Time Series Foundationslayer 2 · tier 2