Statistical Estimation
Cramér-Rao Bound: Information Inequality, Achievability, and Sharper Variants
The fundamental information inequality for unbiased estimation. Coverage of the scalar and multivariate Cramér-Rao bounds, the chain rule for biased estimators, achievability in exponential families, the Bhattacharyya higher-order bounds, the Hammersley-Chapman-Robbins bound (no regularity required), the van Trees Bayesian inequality, and efficient information with nuisance parameters.
Why This Matters
The Cramér-Rao bound is the universal floor on the variance of any unbiased estimator. For an i.i.d. sample of size from ,
where is the Fisher information per observation. The bound has three roles in modern statistics. First, it certifies efficiency: the maximum likelihood estimator attains this bound asymptotically, which is why MLE is the default. Second, it grades estimator design: any concrete unbiased estimator can be compared to to see how much room remains. Third, it links estimation to information geometry: is the Riemannian metric on the statistical manifold, and the bound is the metric-induced lower bound on the squared geodesic length of any unbiased displacement.
The bound is one Cauchy-Schwarz inequality applied to the score function. The proof is short, the implications are deep, and the failure modes (biased estimators, non-regular families, nuisance parameters, finite-sample sharpness) generate a family of refinements: Bhattacharyya higher-order bounds, the Hammersley-Chapman-Robbins bound (no regularity required), the van Trees Bayesian inequality, the multivariate Loewner ordering bound, and the BKRW (Bickel-Klaassen-Ritov-Wellner) semiparametric efficient information.
Mental Model
Think of the score as a direction in the tangent space at . The expected score is zero (the first Bartlett identity); the variance of the score is the Fisher information (the second Bartlett identity). Any unbiased estimator satisfies , so differentiating both sides and exchanging differentiation with integration gives , where is the total score. Cauchy-Schwarz on this covariance and on closes the bound:
Equality holds iff is collinear with the score, which forces an exponential-family structure: only one-parameter exponential families (in their canonical sufficient statistic) admit a finite-sample efficient unbiased estimator. Outside that special case, efficiency is asymptotic, not finite-sample.
The bound generalizes in several directions: the multivariate version replaces scalars with positive-definite matrices in the Loewner order; the biased-estimator version multiplies by the squared bias derivative; the Hammersley-Chapman-Robbins replaces the score with a finite difference and removes the regularity requirement; the van Trees inequality averages over a prior; the efficient-information version projects out nuisance directions. All of these are one Cauchy-Schwarz inequality applied to a slightly different inner-product structure.
Formal Setup
Let be i.i.d. from with . Let denote the log-likelihood and the score.
Score function
The score function is the gradient of the log-density:
Under the regularity conditions below (support of does not depend on and differentiation passes through the integral), for all . This is the first Bartlett identity.
Fisher information
The Fisher information matrix is the covariance of the score:
Under twice-differentiability and regularity, (the second Bartlett identity; see maximum likelihood estimation for the proof). For an i.i.d. sample of size , the total Fisher information is .
Regularity conditions
A parametric family is Cramér-regular at if and only if (1) the support of does not depend on in a neighborhood of , (2) is twice continuously differentiable in for almost every , (3) differentiation of with respect to can be performed under the integral sign (e.g. by dominated convergence), and (4) the Fisher information is finite and positive definite. The Cramér-Rao bound assumes these conditions.
Efficient estimator
An unbiased estimator of a scalar parameter is finite-sample efficient if and only if it attains the Cramér-Rao lower bound with equality at every and every : . An estimator is asymptotically efficient if and only if . The MLE is asymptotically efficient under standard regularity (see asymptotic statistics); finite-sample efficiency is much rarer and characterizes one-parameter exponential families.
The Cramér-Rao Bound
Cramér-Rao Lower Bound (Scalar)
Statement
For any unbiased estimator of ,
Equality at a fixed holds if and only if there exists a function such that
Equality at every in an open neighborhood characterizes one-parameter exponential families.
Intuition
The score is the direction in which the log-likelihood rises fastest. An unbiased estimator must be sensitive to the parameter in the same direction (since perturbing shifts by exactly the same amount). Cauchy-Schwarz says this collinearity has a cost: the unbiased estimator's variance must absorb at least the inverse of the score's variance. High Fisher information means the score is large in magnitude and tightly concentrated, so the data is informative and small variance is achievable.
Proof Sketch
Apply Cauchy-Schwarz to with :
Step 1. Differentiate in and exchange differentiation with integration (valid by Cramér regularity):
(the last equality because ). For the i.i.d. sum, by linearity since only the score on the same sample contributes.
Step 2. since the per-sample scores are i.i.d. with variance .
Step 3. Substituting into Cauchy-Schwarz: .
Equality in Cauchy-Schwarz requires a.s. for some non-random , which translates to the collinearity condition stated.
Why It Matters
This is the central inequality of classical estimation theory. Combined with the asymptotic efficiency of the MLE (see asymptotic statistics), it identifies as the universal asymptotic floor for unbiased estimation under regularity. Every alternative estimator has to be benchmarked against it: if asymptotically, the MLE wins; if it equals it, both are asymptotically efficient. The bound also identifies the parameters that are hard to estimate (small ) and the ones that are easy (large ).
Failure Mode
Biased estimators escape the bound. The James-Stein estimator beats the sample mean in MSE for ; ridge regression, LASSO, and posterior mean estimators all introduce bias in exchange for variance reduction. The Cramér-Rao bound governs only the unbiased class.
Non-regular families. For the support depends on , does not exist almost everywhere, and the bound does not apply. The MLE has variance , which is , faster than the parametric rate. The bound is silent here; use the Hammersley-Chapman-Robbins bound instead.
Bound not attained at finite . Outside one-parameter exponential families, no unbiased estimator attains the bound at every . For example, estimating from : the unbiased estimator has variance strictly larger than at finite , with the gap shrinking at rate .
Multivariate Form
Multivariate Cramér-Rao Bound
Statement
For any unbiased estimator of ,
in the Loewner (positive semidefinite) ordering: is positive semidefinite. Equivalently, for every ,
Intuition
The matrix bound is the simultaneous lower bound on the variance of every linear combination of the parameter estimates. Parameters that are jointly poorly identified (small eigenvalues of in their direction) have correspondingly large minimum-variance bounds. The inverse Fisher matrix is the covariance template that the optimal unbiased estimator must equal or exceed in every direction.
Proof Sketch
Apply the matrix Cauchy-Schwarz inequality to the score-vector and parameter-vector . Differentiating component-wise gives (identity matrix). The total score has covariance . Matrix Cauchy-Schwarz then gives . Specializing to a fixed direction , . Note this is not : the inverse comes after the contraction with , not before. The two coincide only when is a Fisher eigenvector and is diagonal in that direction; in general they differ by the Schur-complement penalty for the orthogonal nuisance directions.
Why It Matters
Real models almost always have multiple parameters. The matrix bound governs how trade-offs between parameters propagate. A model whose Fisher matrix has a small eigenvalue is weakly identified in the corresponding direction: any unbiased estimator must accept large variance in that direction. This connects directly to ill-conditioned regression and to the natural gradient (which preconditions descent directions by to remove this distortion).
Failure Mode
The bound applies to unbiased estimators only. The multivariate biased-estimator version (next theorem) replaces with where is the Jacobian of the bias-corrected mean. When is singular (rank-deficient, i.e. the model is non-identifiable in some direction), the bound is vacuous in that direction; use a generalized inverse and restrict to identifiable contrasts.
Biased Estimators: The Chain-Rule Bound
Cramér-Rao Bound for Biased Estimators
Statement
For any (possibly biased) estimator with mean ,
where is the Jacobian of . In the scalar case this reduces to
Intuition
A biased estimator's mean shifts at rate as changes (not at rate as for unbiased estimators). The Cauchy-Schwarz argument now gives a covariance of between and the score, so the squared-covariance side of Cauchy-Schwarz becomes instead of . The bound scales accordingly.
Proof Sketch
Repeat the scalar proof with replacing . Differentiating gives by the same exchange-of-differentiation step. Cauchy-Schwarz then gives .
Why It Matters
This is the bound that practitioners actually need. Most useful estimators are biased: ridge, LASSO, posterior means, shrinkage, kernel-smoothed estimates. The chain-rule form gives the variance lower bound for any such estimator, parameterized by how its mean depends on . It also justifies the delta method: an estimator of has variance lower-bounded by , which matches the asymptotic variance of exactly under standard regularity.
Failure Mode
The chain-rule bound does not directly bound MSE, only variance. To bound MSE, add the squared bias: . The James-Stein estimator beats the sample mean precisely because its bias is small while its variance reduction is large; the Cramér-Rao bound in either form does not preclude this.
Sharper Bounds: Bhattacharyya and Hammersley-Chapman-Robbins
The Cramér-Rao bound uses only the first derivative of the log-likelihood. Two sharper bounds use higher-order information or no derivatives at all.
Bhattacharyya Higher-Order Bound
Statement
Let for . Define the Gram matrix with entries (the Bhattacharyya information matrix). Then for any unbiased ,
where (since only the first-order moment of depends on ). The case recovers the standard Cramér-Rao bound; gives strictly tighter bounds when the higher-order information matrix is non-degenerate.
Intuition
The standard Cramér-Rao bound projects onto the first score alone. The Bhattacharyya bound projects onto the larger linear span of higher-order log-likelihood derivatives, capturing more of the unbiased estimator's structure. The bound is sharper because the projection onto a larger subspace has equal-or-larger norm.
Proof Sketch
For each , differentiate exactly times. The derivative gives ; higher-order derivatives give for (since is linear in ). Apply Cauchy-Schwarz to the projection of onto using the Gram inverse .
Why It Matters
Bhattacharyya bounds quantify how much sharpness is left on the table by the standard Cramér-Rao bound. In some non-exponential models, the gap is large at finite . The bound also generalizes naturally to the multivariate case (replace by an identity-block selection matrix).
Failure Mode
For one-parameter exponential families, all higher-order Bhattacharyya bounds collapse to the Cramér-Rao bound (the higher-order derivatives are linearly dependent on the first), so no sharpening is possible. Bhattacharyya helps in non-exponential families where the bound is not attained by the Cramér-Rao argument.
Hammersley-Chapman-Robbins Bound (no regularity)
Statement
For any unbiased estimator of and any such that and are mutually absolutely continuous,
where is the chi-squared divergence. As in a Cramér-regular family, the bound recovers the Cramér-Rao bound (since ).
Intuition
Replace the score (a derivative) with a finite difference. The bound asks: how much can the parameter shift by before the data distribution becomes too distinguishable? An unbiased estimator that tracks exactly must also track shifts , so its variance must be at least times a precision factor measured by chi-squared distance. The supremum over tightens the bound; for small the bound recovers the local Cramér-Rao bound, for large it can be tighter or sharper at finite .
Proof Sketch
Define the likelihood ratio . Since is unbiased, . Apply Cauchy-Schwarz:
Rearrange and take the sup over .
Why It Matters
This is the bound to use when Cramér regularity fails. For , the chi-squared divergence between and is finite for (and infinite for ), and computing the supremum gives a non-trivial lower bound on the variance of any unbiased estimator. The HCR bound also handles discrete parameters, irregular boundaries, and any setting where the score is undefined or unbounded.
Failure Mode
The chi-squared divergence can be infinite (e.g. when supports are disjoint), in which case the bound for that is zero (vacuous). Choose small enough that is finite. For complicated parametric families, the HCR bound is hard to compute analytically; the Cramér-Rao bound is preferred when regularity holds.
The Bayesian Extension: van Trees Inequality
van Trees (Bayesian Cramér-Rao) Inequality
Statement
For any estimator of (no unbiasedness required),
where is the Fisher information of the prior with respect to the location family.
Intuition
The Bayesian Cramér-Rao bound combines two information sources: the data information (averaged over the prior) and the prior information . As the prior becomes flat (informationless), and the bound recovers a Bayes-averaged Cramér-Rao bound. As the data dominates, the prior information becomes negligible and the asymptotic Bayes risk matches the asymptotic frequentist Cramér-Rao bound.
Proof Sketch
Apply Cauchy-Schwarz to the joint score , which has variance under the joint distribution. The covariance equals by an integration-by-parts argument that uses the prior vanishing at the boundary. Cauchy-Schwarz then gives the bound.
Why It Matters
The van Trees inequality is the standard tool for proving minimax lower bounds. Combined with a careful choice of prior (often supported on a small ball around the parameter of interest), it yields explicit lower bounds on the worst-case Bayes risk, which dualizes to a lower bound on the minimax risk. This is one of the two main routes to minimax lower bounds, the other being Le Cam's two-point method.
Failure Mode
The bound requires the prior to vanish at the boundary (so the integration-by-parts has no boundary term). Improper priors (uniform on the whole real line) violate this; a careful localization argument is needed. For multidimensional , the matrix version replaces sums with the matrix sum in the appropriate Loewner sense.
Achievability: When Does an Efficient Unbiased Estimator Exist?
The Cramér-Rao bound is achieved by an unbiased estimator at every if and only if the family is a one-parameter exponential family in the natural parameterization, and is the sample mean of the canonical sufficient statistic. The proof: equality in Cauchy-Schwarz requires for some function , which by integrating recovers the exponential-family form with and the Cramér-Rao bound met as .
For other families, the best possible unbiased estimator is the uniformly minimum variance unbiased estimator (UMVUE), which is found by Rao-Blackwellizing any unbiased estimator with respect to a complete sufficient statistic (Rao-Blackwellization). The UMVUE is unique a.s. when a complete sufficient statistic exists (Lehmann-Scheffé). It need not attain the Cramér-Rao bound at finite ; the gap is non-negative and shrinks at rate in regular models.
In multi-parameter settings with nuisance parameters, the relevant lower bound is the efficient information for parameter of interest in the presence of nuisance (Bickel-Klaassen-Ritov-Wellner Efficient and Adaptive Estimation, 1993). The MLE for is asymptotically efficient with respect to , not . This is a strict information loss whenever .
Canonical Examples
Normal mean with known variance
Let with known. The score is , so . The Cramér-Rao bound gives . The sample mean has , attaining the bound exactly. This is a one-parameter exponential family with , so finite-sample efficiency is expected.
Normal mean and variance jointly: Loewner ordering
Let with both unknown. The score vector is . Direct computation gives the Fisher information matrix
The matrix Cramér-Rao bound is . The unbiased estimators are for (variance , attains the bound) and for (variance , strictly above the bound at finite , with the gap closing at rate ).
Poisson rate
Let . The score is , so . The Cramér-Rao bound is , and the sample mean has . Attained, since this is a one-parameter exponential family with .
Cauchy location: efficient unbiased estimator does not exist at finite n
Let with density . The score is and . The Cramér-Rao bound is . No closed-form unbiased estimator achieves this bound at finite (the Cauchy is not exponential family). The MLE achieves it asymptotically: . Note the sample mean has infinite variance for Cauchy data and is not even a consistent estimator; only the MLE (or a robust estimator like the sample median) works.
Uniform U(0, theta): Cramér-Rao does not apply, HCR does
Let . The support depends on , so Cramér regularity fails. The Cramér-Rao bound is silent. Use the Hammersley-Chapman-Robbins bound: for , for one observation, so for an i.i.d. sample of size the chi-squared divergence is . Optimizing over gives a bound of order , matching the actual variance of the MLE .
Linear regression beta hat is efficient
Let with fixed full-rank. The Fisher information for (with known) is . The Cramér-Rao bound is . The OLS estimator has covariance exactly , attaining the bound. This is the Gauss-Markov theorem in its Cramér-Rao form: OLS is efficient under Gaussian noise (and BLUE under any noise with finite variance).
Common Confusions
Cramér-Rao does not bound MSE for biased estimators
The bound governs variance, not MSE. MSE = variance + bias². A biased estimator can have lower MSE than when the bias is small and the variance is much smaller. James-Stein, ridge regression, LASSO, posterior means, and shrinkage estimators all do this systematically. The bound is vacuous as a statement about MSE for the biased class.
Asymptotic efficiency is weaker than finite-sample efficiency
The MLE is asymptotically efficient under standard regularity: . This says the limiting variance equals the Cramér-Rao bound. It does not say the MLE attains the bound at any finite (it generally does not, except in one-parameter exponential families). Finite-sample variance is typically , with the second-order term computable via the Bartlett correction or Edgeworth expansion.
The bound applies to unbiased estimators of theta, not of g(theta)
The Cramér-Rao bound for the parameter is . For an unbiased estimator of a transformed parameter , the relevant bound is , which is the chain-rule version. Conflating the two is a common error: e.g. estimating vs requires applying the chain rule with , .
The bound is local at theta, not uniform in theta
The Cramér-Rao bound is a function of : . An estimator can attain the bound at one value of and exceed it at others; the uniform attainability characterizes one-parameter exponential families. For minimax-optimal inference (worst-case over ), use the van Trees inequality or Le Cam's method instead.
With nuisance parameters, the right denominator is efficient information
For a parameter of interest with a nuisance parameter , the asymptotic variance of any regular estimator of is bounded below by where is the Schur complement (the efficient information). Using (the marginal Fisher information) is wrong whenever ; it under-states the achievable variance. The MLE for is asymptotically efficient with respect to .
Singular Fisher matrix means the model is non-identifiable in some direction
A singular Fisher information matrix has zero eigenvalues, indicating directions in parameter space along which the likelihood is locally flat. No data set, however large, can distinguish parameters along these directions; the Cramér-Rao bound is infinite (vacuous) in those directions. Either reparameterize to identify only the estimable contrasts, or impose regularization (a prior, a sparsity constraint) to pin down the model.
Exercises
Problem
Compute the Fisher information for and state the Cramér-Rao bound for estimating from i.i.d. observations. Verify that is efficient.
Problem
Let be i.i.d. from a one-parameter exponential family in canonical form: . Let estimate .
(a) Show is unbiased for . (b) Compute for one observation. (c) Show and that this attains the chain-rule Cramér-Rao bound for estimating .
Problem
Show that the Cramér-Rao bound for estimating from is . The unbiased estimator has variance — exactly attaining the bound. Now consider estimating (not ). Compute the chain-rule Cramér-Rao bound for , and show that the unbiased estimator (with chosen for unbiasedness) has variance strictly larger than this bound at finite .
Problem
Consider the location family for some symmetric, smooth density . Show that the Fisher information is , independent of . Compute for = Gaussian, Laplace, and Cauchy. Comment on which has the highest Fisher information per observation.
Problem
Use the van Trees inequality to derive a lower bound on the minimax risk for estimating from with . Choose a smooth prior supported on a small ball around any fixed , apply van Trees, then take the supremum over .
Related Comparisons
References
Canonical:
- Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Duxbury. Chapter 7.3 (Cramér-Rao inequality and efficiency).
- Lehmann, E. L., & Casella, G. (1998). Theory of Point Estimation (2nd ed.). Springer. Chapter 2 (unbiasedness, Cramér-Rao, UMVUE), Chapter 6 (multivariate Cramér-Rao).
- Schervish, M. J. (1995). Theory of Statistics. Springer. Section 2.3 (information inequalities, Bhattacharyya and HCR bounds).
- Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press. Original treatment of the inequality (Section 32.3).
- Rao, C. R. (1945). "Information and the accuracy attainable in the estimation of statistical parameters." Bulletin of the Calcutta Mathematical Society, 37, 81-91. The other half of the eponymous bound.
Current:
- van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. Chapter 8 (efficient estimators, asymptotic Cramér-Rao bound), Chapter 25 (semiparametric efficient information).
- Bickel, P. J., Klaassen, C. A. J., Ritov, Y., & Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press. Chapter 3 (efficient information with nuisance parameters; the modern reference for ).
- Keener, R. W. (2010). Theoretical Statistics. Springer. Chapter 3 (unbiased estimation and efficiency), Chapter 4 (UMVUE via Lehmann-Scheffé).
- Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.). Wiley. Chapter 11.10 (Fisher information and the Cramér-Rao bound, KL connection).
- Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer. Chapter 2 (van Trees inequality and Le Cam's two-point method for minimax lower bounds).
Critique and refinements:
- Hammersley, J. M. (1950). "On estimating restricted parameters." Journal of the Royal Statistical Society B, 12, 192-240. The HCR bound.
- Chapman, D. G., & Robbins, H. (1951). "Minimum variance estimation without regularity assumptions." Annals of Mathematical Statistics, 22, 581-586. The other half of the eponymous bound.
- Bhattacharyya, A. (1946). "On some analogues of the amount of information and their use in statistical estimation." Sankhyā, 8, 1-14. Higher-order information bounds.
- van Trees, H. L. (1968). Detection, Estimation, and Modulation Theory, Part I. Wiley. The Bayesian Cramér-Rao inequality (Chapter 2).
- Stein, C. (1956). "Inadmissibility of the usual estimator for the mean of a multivariate normal distribution." Proceedings of the Third Berkeley Symposium, 1, 197-206. Shows that biased shrinkage estimators beat the Cramér-Rao-efficient sample mean in MSE for . See shrinkage estimation.
- Brown, L. D., & Gajek, L. (1990). "Information inequalities for the Bayes risk." Annals of Statistics, 18, 1578-1594. Sharper Bayesian information inequalities than van Trees in some settings.
Next Topics
- Asymptotic statistics: the MLE attains the Cramér-Rao bound asymptotically; LAN families, contiguity, the convolution theorem.
- Maximum likelihood estimation: where the asymptotic normal limit comes from.
- Minimax lower bounds: van Trees and Le Cam techniques, the right framework when no unbiased efficient estimator exists.
- Shrinkage estimation: James-Stein: how bias buys you variance and beats the Cramér-Rao-efficient sample mean.
- Rao-Blackwellization: the constructive route to UMVUE when the bound is not attained.
Last reviewed: April 19, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
3- Fisher Information: Curvature, KL Geometry, and the Natural Gradientlayer 0B · tier 1
- Maximum Likelihood Estimation: Theory, Information Identity, and Asymptotic Efficiencylayer 0B · tier 1
- KL Divergencelayer 1 · tier 1