Skip to main content

Foundations

Positive Semidefinite Matrices

PSD matrices: equivalent characterizations, Cholesky decomposition, Schur complement, and Loewner ordering. Covariance matrices are PSD. Hessians of convex functions are PSD. These facts connect linear algebra to optimization and statistics.

CoreTier 1StableSupporting~25 min

Why This Matters

Positive semidefiniteness is the matrix analogue of non-negativity for scalars. It appears in three central places in ML:

Covariance matrices are always PSD. If is a covariance matrix, then xTΣx=Var(xTZ)0x^T \Sigma x = \text{Var}(x^T Z) \geq 0 for any vector xx. The notation ("PSD") and (transpose) recur on every line.

Hessians of convex functions are PSD. If ff is twice differentiable on an open convex set CC, then ff is convex on CC if and only if 2f(x)0\nabla^2 f(x) \succeq 0 for all xCx \in C. The convex-domain hypothesis is essential — convexity is a statement about line segments staying inside the domain, so the Hessian condition alone on a non-convex set does not imply convexity. This connects PSD matrices to the theory of convex optimization.

Kernel matrices (Gram matrices) are PSD by construction. The entire theory of kernel methods rests on this fact.

The key habit here is to stop treating "PSD" as a mysterious adjective and start translating it immediately into one of three concrete statements: every quadratic form xAxx^\top A x is nonnegative, every eigenvalue is nonnegative, or the matrix can be factored as a Gram object like BBB^\top B. Strong intuition comes from moving fluidly between those three views rather than memorizing a single definition in isolation.

That fluency matters because different ML arguments naturally live in different languages. Optimization proofs usually speak in Hessians and curvature, statistics speaks in covariance and Fisher information, and kernel methods speak in Gram matrices. PSD is the bridge connecting all three.

Five Equivalent Characterizations and When to Use Each

The PSD Equivalence Theorem (below) gives five different ways to define the same property. Each is useful in a different context. The table is the practical decision aid:

CharacterizationWhat you checkWhen to reach for itCost
Quadratic form xAx0x^\top A x \ge 0Algebraic identity for all xxTheoretical proofs; checking that a constructed matrix is PSD by definitionSymbolic
Eigenvalues λi0\lambda_i \ge 0Spectrum is non-negativeNumerical certificates; small matrices; when you already have the eigendecompositionO(n3)O(n^3)
Gram factorization A=BBA = B^\top BAA factors through some BBSampling from Gaussians; constructing kernels; proving PSD by exhibiting BBO(n2r)O(n^2 r) where rr is the rank of BB
Cholesky A=LLA = LL^\topLL exists with positive diagonal (PD case)Solving linear systems; log-determinant computation; samplingO(n3/3)O(n^3/3)
Principal minors all 0\ge 0Sign check on 2n12^n - 1 submatrix determinantsSymbolic / parametric proofs over a single small example; rarely the right tool numericallyExponential in nn

Pick by context. Optimization proofs usually live in eigenvalues and quadratic forms. Bayesian sampling lives in Cholesky and Gram. Symbolic algebra sometimes wants principal minors. Numerical software almost always uses Cholesky internally because it is the cheapest direct test for positive-definiteness on a real machine. The standard ML fluency is to move between the first three views without friction.

Quick Version

  • Quadratic-form test. A0A \succeq 0 means xAx0x^\top A x \ge 0 for every vector xx.
  • Spectral test. For symmetric matrices, PSD is equivalent to all eigenvalues being non-negative.
  • Factorization view. PSD matrices can be written as A=BBA = B^\top B; PD matrices also admit a clean Cholesky factorization A=LLA = LL^\top.
  • Why ML cares. Covariance matrices, Hessians of convex objectives, Gram matrices, and Fisher information all live in the PSD world.

Core Definitions

Definition

Positive Semidefinite Matrix

A symmetric matrix ARn×nA \in \mathbb{R}^{n \times n} is positive semidefinite (PSD) if and only if:

xTAx0for all xRnx^T A x \geq 0 \quad \text{for all } x \in \mathbb{R}^n

The matrix is positive definite (PD), written A0A \succ 0, if and only if the inequality is strict for all x0x \neq 0.

Definition

Loewner Ordering

For symmetric matrices AA and BB, the Loewner ordering is:

AB    AB0    xT(AB)x0 for all xA \succeq B \iff A - B \succeq 0 \iff x^T(A - B)x \geq 0 \text{ for all } x

This is a partial order on symmetric matrices. It is not a total order: most pairs of matrices are incomparable.

Main Theorems

Theorem

PSD Equivalence Theorem

Statement

The following are equivalent for a symmetric matrix ARn×nA \in \mathbb{R}^{n \times n}:

  1. A0A \succeq 0 (i.e., xTAx0x^T A x \geq 0 for all xx)
  2. All eigenvalues of AA are non-negative
  3. A=BTBA = B^T B for some matrix BB
  4. All principal minors of AA are non-negative
  5. There exists a lower triangular LL with non-negative diagonal such that A=LLTA = LL^T (Cholesky decomposition, if A0A \succ 0)

Intuition

PSD means the quadratic form xTAxx^T A x is a bowl that never dips below zero. Eigenvalues are the curvatures along the principal axes. Non-negative curvature in every direction means non-negative eigenvalues.

Proof Sketch

(12)(1 \Rightarrow 2): If Av=λvAv = \lambda v with v=1\|v\| = 1, then λ=vTAv0\lambda = v^T A v \geq 0. (23)(2 \Rightarrow 3): By the spectral theorem, A=QΛQTA = Q \Lambda Q^T where Λ\Lambda has non-negative entries. Set B=Λ1/2QTB = \Lambda^{1/2} Q^T. (31)(3 \Rightarrow 1): xTAx=xTBTBx=Bx20x^T A x = x^T B^T B x = \|Bx\|^2 \geq 0.

Why It Matters

Different characterizations are useful in different contexts. Checking eigenvalues is O(n3)O(n^3). Checking the quadratic form definition is useful for proofs. The factorization A=BTBA = B^T B is used in sampling from Gaussians: if ZN(0,I)Z \sim N(0, I) and Σ=LLT\Sigma = LL^T, then LZN(0,Σ)LZ \sim N(0, \Sigma).

Failure Mode

Symmetry is required. A non-symmetric matrix can satisfy xTAx0x^T A x \geq 0 for all xx without having real eigenvalues. The standard convention in ML is that PSD refers to symmetric (or Hermitian) matrices only.

The Sylvester-criterion subtlety: for positive definiteness (A0A \succ 0) it is enough that all leading principal minors are positive — a clean, easy-to-check criterion. For positive semidefiniteness (A0A \succeq 0) leading principal minors being non-negative is not enough; you need every principal minor (not just the leading ones) to be non-negative. A standard counterexample is A=diag(0,1)A = \mathrm{diag}(0, -1): the leading principal minor of size 1 is 0 (non-negative) and the determinant is 0 (also non-negative), yet the eigenvalue 1-1 shows A⪰̸0A \not\succeq 0. Adding the bottom-right principal minor 1-1 to the check would have caught it.

Cholesky Decomposition

Every positive definite matrix AA has a unique factorization A=LLTA = LL^T where LL is lower triangular with positive diagonal entries. This is the Cholesky decomposition.

Computational cost: O(n3/3)O(n^3/3), which is half the cost of a general LU decomposition. It is the preferred method for solving linear systems Ax=bAx = b when AA is known to be PD: factor once, then solve Ly=bLy = b and LTx=yL^T x = y by back-substitution.

For PSD (not PD) matrices, the Cholesky factorization exists but LL may have zeros on the diagonal.

Schur Complement

Definition

Schur Complement

Given a symmetric block matrix:

M=(ABBTC)M = \begin{pmatrix} A & B \\ B^T & C \end{pmatrix}

with AA and CC symmetric and AA invertible, the Schur complement of AA in MM is:

S=CBTA1BS = C - B^T A^{-1} B

The key fact: if A0A \succ 0, then M0M \succeq 0 if and only if S0S \succeq 0. Block-convention warning: under the alternative layout M=(ABBC)M = \begin{pmatrix} A & B^\top \\ B & C \end{pmatrix} (with BB in the lower-left block) the Schur complement of AA becomes S=CBA1BS = C - B A^{-1} B^\top. Symmetry of MM and the role of Loewner order both depend on this convention.

The Schur complement appears in conditional distributions: if (X,Y)N(0,M)(X, Y) \sim N(0, M), then the conditional covariance of YY given XX is the Schur complement of the XX-block.

Properties of PSD Matrices

  • Sum: if A0A \succeq 0 and B0B \succeq 0, then A+B0A + B \succeq 0
  • Scaling: if A0A \succeq 0 and α0\alpha \geq 0, then αA0\alpha A \succeq 0
  • Congruence: if A0A \succeq 0, then BABT0BAB^T \succeq 0 for any matrix BB
  • Trace: if A0A \succeq 0, then tr(A)0\text{tr}(A) \geq 0 (sum of eigenvalues)
  • Hadamard product: if A0A \succeq 0 and B0B \succeq 0, then AB0A \circ B \succeq 0 (Schur product theorem)

The set of PSD matrices forms a convex cone: it is closed under addition and non-negative scalar multiplication.

Common Confusions

Watch Out

PSD is about the symmetric part

For a non-symmetric matrix MM, the quadratic form xTMx=xTM+MT2xx^T M x = x^T \frac{M + M^T}{2} x. So the sign of the quadratic form depends only on the symmetric part (M+MT)/2(M + M^T)/2. When someone says "the Hessian is PSD," this is only meaningful because Hessians of twice-differentiable functions are symmetric.

Watch Out

Positive entries do not imply PSD

The matrix (1221)\begin{pmatrix} 1 & 2 \\ 2 & 1 \end{pmatrix} has all positive entries but eigenvalues 33 and 1-1, so it is not PSD. Conversely, the identity matrix is PSD despite having zeros off the diagonal.

Exercises

ExerciseCore

Problem

Prove that every covariance matrix is PSD. That is, if Σ=E[(Xμ)(Xμ)T]\Sigma = \mathbb{E}[(X - \mu)(X - \mu)^T], show Σ0\Sigma \succeq 0.

ExerciseAdvanced

Problem

Let A0A \succ 0 be n×nn \times n positive definite. Prove that A10A^{-1} \succ 0.

References

Canonical:

  • Horn & Johnson, Matrix Analysis (2013), Chapter 7
  • Strang, Linear Algebra and Its Applications (2006), Section 6.5
  • Bhatia, Positive Definite Matrices (2007), Chapters 1-2 (characterizations and Loewner ordering)

Current:

  • Boyd & Vandenberghe, Convex Optimization (2004), Section 2.3 (PSD cone)
  • Golub & Van Loan, Matrix Computations (2013), Chapter 4 (Cholesky decomposition)
  • Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Section 4.3 (positive definite matrices and Cholesky)

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

1