Foundations
Positive Semidefinite Matrices
PSD matrices: equivalent characterizations, Cholesky decomposition, Schur complement, and Loewner ordering. Covariance matrices are PSD. Hessians of convex functions are PSD. These facts connect linear algebra to optimization and statistics.
Prerequisites
Why This Matters
Positive semidefiniteness is the matrix analogue of non-negativity for scalars. It appears in three central places in ML:
Covariance matrices are always PSD. If is a covariance matrix, then for any vector . The notation ("PSD") and (transpose) recur on every line.
Hessians of convex functions are PSD. If is twice differentiable on an open convex set , then is convex on if and only if for all . The convex-domain hypothesis is essential — convexity is a statement about line segments staying inside the domain, so the Hessian condition alone on a non-convex set does not imply convexity. This connects PSD matrices to the theory of convex optimization.
Kernel matrices (Gram matrices) are PSD by construction. The entire theory of kernel methods rests on this fact.
The key habit here is to stop treating "PSD" as a mysterious adjective and start translating it immediately into one of three concrete statements: every quadratic form is nonnegative, every eigenvalue is nonnegative, or the matrix can be factored as a Gram object like . Strong intuition comes from moving fluidly between those three views rather than memorizing a single definition in isolation.
That fluency matters because different ML arguments naturally live in different languages. Optimization proofs usually speak in Hessians and curvature, statistics speaks in covariance and Fisher information, and kernel methods speak in Gram matrices. PSD is the bridge connecting all three.
Five Equivalent Characterizations and When to Use Each
The PSD Equivalence Theorem (below) gives five different ways to define the same property. Each is useful in a different context. The table is the practical decision aid:
| Characterization | What you check | When to reach for it | Cost |
|---|---|---|---|
| Quadratic form | Algebraic identity for all | Theoretical proofs; checking that a constructed matrix is PSD by definition | Symbolic |
| Eigenvalues | Spectrum is non-negative | Numerical certificates; small matrices; when you already have the eigendecomposition | |
| Gram factorization | factors through some | Sampling from Gaussians; constructing kernels; proving PSD by exhibiting | where is the rank of |
| Cholesky | exists with positive diagonal (PD case) | Solving linear systems; log-determinant computation; sampling | |
| Principal minors all | Sign check on submatrix determinants | Symbolic / parametric proofs over a single small example; rarely the right tool numerically | Exponential in |
Pick by context. Optimization proofs usually live in eigenvalues and quadratic forms. Bayesian sampling lives in Cholesky and Gram. Symbolic algebra sometimes wants principal minors. Numerical software almost always uses Cholesky internally because it is the cheapest direct test for positive-definiteness on a real machine. The standard ML fluency is to move between the first three views without friction.
Quick Version
- Quadratic-form test. means for every vector .
- Spectral test. For symmetric matrices, PSD is equivalent to all eigenvalues being non-negative.
- Factorization view. PSD matrices can be written as ; PD matrices also admit a clean Cholesky factorization .
- Why ML cares. Covariance matrices, Hessians of convex objectives, Gram matrices, and Fisher information all live in the PSD world.
Core Definitions
Positive Semidefinite Matrix
A symmetric matrix is positive semidefinite (PSD) if and only if:
The matrix is positive definite (PD), written , if and only if the inequality is strict for all .
Loewner Ordering
For symmetric matrices and , the Loewner ordering is:
This is a partial order on symmetric matrices. It is not a total order: most pairs of matrices are incomparable.
Main Theorems
PSD Equivalence Theorem
Statement
The following are equivalent for a symmetric matrix :
- (i.e., for all )
- All eigenvalues of are non-negative
- for some matrix
- All principal minors of are non-negative
- There exists a lower triangular with non-negative diagonal such that (Cholesky decomposition, if )
Intuition
PSD means the quadratic form is a bowl that never dips below zero. Eigenvalues are the curvatures along the principal axes. Non-negative curvature in every direction means non-negative eigenvalues.
Proof Sketch
: If with , then . : By the spectral theorem, where has non-negative entries. Set . : .
Why It Matters
Different characterizations are useful in different contexts. Checking eigenvalues is . Checking the quadratic form definition is useful for proofs. The factorization is used in sampling from Gaussians: if and , then .
Failure Mode
Symmetry is required. A non-symmetric matrix can satisfy for all without having real eigenvalues. The standard convention in ML is that PSD refers to symmetric (or Hermitian) matrices only.
The Sylvester-criterion subtlety: for positive definiteness () it is enough that all leading principal minors are positive — a clean, easy-to-check criterion. For positive semidefiniteness () leading principal minors being non-negative is not enough; you need every principal minor (not just the leading ones) to be non-negative. A standard counterexample is : the leading principal minor of size 1 is 0 (non-negative) and the determinant is 0 (also non-negative), yet the eigenvalue shows . Adding the bottom-right principal minor to the check would have caught it.
Cholesky Decomposition
Every positive definite matrix has a unique factorization where is lower triangular with positive diagonal entries. This is the Cholesky decomposition.
Computational cost: , which is half the cost of a general LU decomposition. It is the preferred method for solving linear systems when is known to be PD: factor once, then solve and by back-substitution.
For PSD (not PD) matrices, the Cholesky factorization exists but may have zeros on the diagonal.
Schur Complement
Schur Complement
Given a symmetric block matrix:
with and symmetric and invertible, the Schur complement of in is:
The key fact: if , then if and only if . Block-convention warning: under the alternative layout (with in the lower-left block) the Schur complement of becomes . Symmetry of and the role of Loewner order both depend on this convention.
The Schur complement appears in conditional distributions: if , then the conditional covariance of given is the Schur complement of the -block.
Properties of PSD Matrices
- Sum: if and , then
- Scaling: if and , then
- Congruence: if , then for any matrix
- Trace: if , then (sum of eigenvalues)
- Hadamard product: if and , then (Schur product theorem)
The set of PSD matrices forms a convex cone: it is closed under addition and non-negative scalar multiplication.
Common Confusions
PSD is about the symmetric part
For a non-symmetric matrix , the quadratic form . So the sign of the quadratic form depends only on the symmetric part . When someone says "the Hessian is PSD," this is only meaningful because Hessians of twice-differentiable functions are symmetric.
Positive entries do not imply PSD
The matrix has all positive entries but eigenvalues and , so it is not PSD. Conversely, the identity matrix is PSD despite having zeros off the diagonal.
Exercises
Problem
Prove that every covariance matrix is PSD. That is, if , show .
Problem
Let be positive definite. Prove that .
References
Canonical:
- Horn & Johnson, Matrix Analysis (2013), Chapter 7
- Strang, Linear Algebra and Its Applications (2006), Section 6.5
- Bhatia, Positive Definite Matrices (2007), Chapters 1-2 (characterizations and Loewner ordering)
Current:
- Boyd & Vandenberghe, Convex Optimization (2004), Section 2.3 (PSD cone)
- Golub & Van Loan, Matrix Computations (2013), Chapter 4 (Cholesky decomposition)
- Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Section 4.3 (positive definite matrices and Cholesky)
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
1- Eigenvalues and Eigenvectorslayer 0A · tier 1
Derived topics
5- Fisher Information: Curvature, KL Geometry, and the Natural Gradientlayer 0B · tier 1
- The Multivariate Normal Distributionlayer 0B · tier 1
- Convex Optimization Basicslayer 1 · tier 1
- Principal Component Analysislayer 1 · tier 1
- 3D Gaussian Splattinglayer 4 · tier 3