Statistical Estimation
Asymptotic Statistics: M-Estimators, Delta Method, LAN
The large-sample toolbox for statistical inference: continuous mapping theorem, Slutsky, the delta method, M- and Z-estimator consistency and asymptotic normality, MLE as a special M-estimator, local asymptotic normality (Le Cam), the asymptotic equivalence of Wald / score / likelihood-ratio tests, and influence-function representations. These results justify essentially every confidence interval, standard error, and p-value in applied statistics, and they are the language of modern semiparametric theory.
Prerequisites
Why This Matters
Three asymptotically equivalent ways to measure how far the data carries the parameter from H₀ on the same quadratic log-likelihood.
Under LAN, all three statistics equal Δₙᵀ I⁻¹ Δₙ + op(1) and converge to χ²k; they differ in finite samples and under misspecification.
Almost every confidence interval, standard error, and -value in applied statistics relies on asymptotic theory. When you report , you are invoking asymptotic normality of . When you compare a likelihood-ratio statistic to a table, you are invoking Wilks' theorem. When you compute a sandwich standard error after a misspecified model fit, you are invoking M-estimator asymptotic normality with the model-robust variance formula. When you bootstrap, you are using a finite-sample proxy for an asymptotic distribution.
The page that follows develops the toolbox in the order it actually gets used. First: the four "plumbing" results (continuous mapping, Slutsky, delta method, joint convergence) that turn one CLT into a thousand applied results. Second: the M- and Z-estimator framework (van der Vaart Ch 5) that subsumes MLE, GMM, OLS, and quantile regression as one theorem. Third: Le Cam's local asymptotic normality, which says all regular parametric problems look like Gaussian shift experiments at the scale and which underwrites the asymptotic minimax lower bound. Fourth: the Wald, score, and likelihood-ratio tests, which are asymptotically equivalent under the null but differ in finite samples and under misspecification.
The synthesis that anchors stats-PhD intuition: every regular estimator has an influence function that determines its asymptotic variance, and the MLE achieves the unique influence function with minimum variance (the efficient influence function), which equals the inverse Fisher information. Everything else is variation around this center.
Mental Model
Think of asymptotic statistics as three stacked layers.
-
Convergence calculus. Convergence in distribution and in probability are partial-order-like relations on sequences of random variables. The continuous mapping theorem pushes them through continuous functions, Slutsky lets you absorb constants, the delta method pushes them through smooth deterministic maps. These are the algebraic moves you make to manipulate limits.
-
Estimator asymptotics. Most estimators of interest are defined as minimizers (M-estimators) or zeros (Z-estimators) of a sample criterion:
Under regularity conditions they are -consistent and asymptotically normal with variance you can write down in closed form. MLE is the special case , with variance .
-
Local geometry. Around the truth, the statistical experiment looks Gaussian at the scale. Local asymptotic normality (Le Cam) makes this precise via the contiguity of nearby measures and the convergence of likelihood ratios to a normal density. The asymptotic minimax bound, the Wald/score/LRT equivalence, and semiparametric efficiency theory all sit on this layer.
Notation
Throughout: for convergence in distribution, for convergence in probability, for almost sure. means is bounded in probability ( as ). means . We write for a single log-likelihood and for the sample log-likelihood. The score is , with sample average . Fisher information is , with the equality holding under regularity (the information identity).
The Convergence Calculus
Continuous Mapping Theorem (CMT)
Statement
Let in a metric space and let be measurable and continuous at every point of a set with . Then
The same holds with replaced by or .
Intuition
Continuous functions preserve convergence. The "continuous on a set of -measure one" condition allows to be discontinuous at points the limit never visits, which matters in practice (e.g., when is positive almost surely).
Proof Sketch
Use the Portmanteau theorem: iff for every bounded continuous . For continuous on a -measure-one set , the composition is bounded and continuous on the same set, and its discontinuity points have measure zero. Apply Portmanteau to to conclude. See van der Vaart Asymptotic Statistics Theorem 2.3.
Why It Matters
CMT is the most-used convergence result. Every "transform an estimator through a smooth function" argument starts here. Slutsky and the delta method are corollaries.
Failure Mode
Discontinuity of at a point where the limit puts positive mass breaks the conclusion. Example: let a.s., , and . Then but while . The discontinuity at zero coincides with the limit's support.
Slutsky's Theorem
Statement
If and for a deterministic constant , then jointly , and therefore for any continuous ,
The standard corollaries:
Intuition
A sequence converging in probability to a constant behaves like that constant in the limit. Marginal convergence and does not imply joint convergence in general; Slutsky works precisely because the limit is degenerate (a point mass), so there is only one admissible coupling.
Proof Sketch
implies , which gives joint convergence via the coupling that pairs with the deterministic limit in the limit. Apply CMT to the continuous map . See van der Vaart Lemma 2.8.
Why It Matters
Slutsky is the glue that holds asymptotic arguments together. Every time you replace with in a -statistic and claim the limit is still standard normal, you are using Slutsky.
Failure Mode
Slutsky fails if has a non-degenerate limit. Example: if are independent standard normals, and , but has variance , not the variance that "treating as " would suggest. The fix is joint convergence and CMT applied to the joint limit.
Delta Method (univariate and multivariate)
Statement
Univariate. If and is differentiable at with , then
Multivariate. If and has Jacobian at , then
Second-order. If additionally and , then
The convergence rate changes from to and the limit is no longer normal.
Intuition
A smooth function of an approximately normal random vector is itself approximately normal, with covariance pushed forward through the Jacobian. The second-order version covers the degenerate case where the gradient vanishes (e.g., the variance of the sample variance at the true variance has this structure).
Proof Sketch
Univariate: Taylor expand where . Multiply by : . Apply Slutsky to absorb the remainder.
Multivariate: same argument with the gradient replaced by the Jacobian . Push-forward of a Gaussian through a linear map is Gaussian with covariance .
Second-order: when the Taylor expansion gives . Multiplying by and using gives .
Why It Matters
The delta method is the workhorse of applied statistics. Variance of , of , of , of any smooth functional of moments. It is also the basis for variance-stabilizing transformations (the Anscombe and Freeman-Tukey families come from solving an ODE in ).
Failure Mode
Three failure modes. (1) : use the second-order version. (2) not differentiable at : e.g., at gives a folded-normal limit, not a normal. (3) converges at a non- rate: the delta method must use that rate, not , in the expansion.
Tangent linearization of g around μ: an input spread σ/√N pushes to an output spread |g′(μ)|·σ/√N
Blue ribbon on the x-axis: input spread ±σ/√N. Green ribbon on the y-axis: output spread ±|g′(μ)|·σ/√N, the delta-method prediction. The orange dashed tangent line is the linearization the method uses; the gray curve is the true g. The approximation tightens as σ/√N shrinks.
M- and Z-Estimators: The Unified Framework
Most estimators of interest in statistics arise as either M-estimators (arg-min of a sample criterion) or Z-estimators (zero of a sample estimating equation). Under regularity these are equivalent (set ), but the Z-formulation is more convenient for asymptotic analysis because it avoids second-order arg-min arguments.
M-estimator and Z-estimator
Given iid data from and a parameter space , an M-estimator is
A Z-estimator is any solution
where is the estimating function. The population analogue is where , or equivalently where .
Examples:
- MLE. , .
- OLS. , .
- GMM (just-identified). moment conditions evaluated at .
- Quantile regression. for the -th quantile.
- Huber's robust location. , where is the Huber influence function (linear inside , capped outside).
M-Estimator Asymptotic Normality
Statement
Under the stated regularity conditions, the Z-estimator solving satisfies
where and .
The variance is the sandwich form , with the "bread" and the "meat." When the model is correctly specified and is the score (MLE case), the information identity gives , the sandwich collapses to , and the variance equals the inverse Fisher information.
Intuition
Expand the estimating equation around :
Solve for , multiply by :
The right side is a CLT applied to mean-zero iid vectors with covariance , scaled by the deterministic matrix . The push-forward gives the sandwich variance.
Proof Sketch
Write the estimating equation with a mean-value-theorem step (or use a stochastic equicontinuity argument when is non-smooth). The linearization
with between and , plus a uniform LLN on , gives . The CLT gives . Apply Slutsky and CMT to invert and conclude. See van der Vaart Asymptotic Statistics Theorem 5.21.
A more refined version (Theorem 5.41, "Z-estimators with non-smooth ") replaces pointwise differentiability with Hellinger differentiability of the model and stochastic equicontinuity of the empirical-process term. This handles quantile regression and other with discontinuities.
Why It Matters
This is the most general asymptotic-normality theorem in regular parametric inference. MLE asymptotic normality is the special case where the model is correctly specified and is the score. OLS, GMM, Huber's robust location, and quantile regression all follow as special cases with their own and . The sandwich variance is also exactly what you should report when you suspect model misspecification: it is robust to , while the naive "model-based" variance is not.
Failure Mode
Three failure modes worth naming. (1) Lack of identification: if is singular, the asymptotic variance is undefined and the estimator may converge at a slower rate (cube-root asymptotics for maximum-score estimators, rate). (2) Boundary parameter: if is on the boundary of the limit is not normal but a half-normal or projection of a normal onto a cone (Self & Liang 1987, JASA). (3) Misspecification: defines the "least-false" parameter, but the sandwich variance still applies; the naive Fisher-information-based variance does not.
MLE as a Special Case
Asymptotic Normality of MLE
Statement
Under the regularity conditions above, the maximum likelihood estimator satisfies
where is the Fisher information matrix, with the second equality holding by the information identity.
Intuition
The MLE is asymptotically normal centered at the truth with covariance . This is the smallest variance any regular estimator can achieve (Cramér-Rao lower bound), so the MLE is asymptotically efficient. The result follows from the M-estimator theorem with , , , and the sandwich collapsing to .
Proof Sketch
Apply the M-estimator theorem with . The information identity (which holds under regularity, by differentiating twice and exchanging integration and differentiation) gives , so the sandwich variance reduces to .
Why It Matters
Justifies the standard practice of reporting MLE point estimates with standard errors derived from the inverse observed information. It also underwrites likelihood-based inference: confidence intervals from inverting the Wald, score, or LRT statistic; profile-likelihood intervals for nuisance-parameter problems; and the asymptotic chi-squared distribution of in nested-model testing.
Failure Mode
Regularity fails at boundary parameters (e.g., in ), non-identified models (mixture models with unknown component count), models where the support depends on the parameter (e.g., , where converges at rate to an exponential), and singular information ( not full rank, e.g., at a point of non-identification in mixture models).
Le Cam Theory: Local Asymptotic Normality
Local Asymptotic Normality (Le Cam)
Statement
Under DQM at , the log-likelihood ratio admits the LAN expansion
where the central sequence satisfies under .
Le Cam's three lemmas then give:
- The product measures and are mutually contiguous.
- Convergence in distribution under transfers to via the change-of-measure formula.
- The asymptotic experiment is the Gaussian shift experiment : every regular procedure in the original problem has a corresponding procedure in the Gaussian experiment with the same asymptotic risk.
Intuition
At the local scale around the truth, every regular parametric problem reduces to a Gaussian shift problem. The sufficient statistic is the central sequence (the normalized score), the parameter is the local direction , and the Fisher information sets the signal-to-noise ratio. This is the deepest structural result in classical parametric statistics: it says the MLE is asymptotically optimal not because of clever proofs but because it is the natural estimator in the limiting Gaussian experiment.
Differentiability in quadratic mean (DQM) is weaker than pointwise differentiability of : it requires only that the square-root density be differentiable in as a function of . This handles models with bounded support, non-smooth densities, and many other "almost regular" cases where the textbook regularity conditions fail.
Proof Sketch
Hellinger / DQM expansion: under DQM at with score ,
with . Square, sum over , take logs, and Taylor expand . The linear term aggregates to ; the quadratic term aggregates by a LLN-type argument to . The remainder is uniformly in on bounded sets. See van der Vaart Asymptotic Statistics Theorem 7.2 and Le Cam-Yang Asymptotics in Statistics Chapter 6.
Le Cam's three lemmas follow from the LAN expansion via standard arguments. Mutual contiguity is immediate from the boundedness in probability of the log-likelihood ratio; the change-of-measure formula is just the definition of contiguity; the Gaussian shift representation follows from the convergence jointly.
Why It Matters
LAN underwrites the asymptotic minimax lower bound: no regular estimator can have asymptotic variance smaller than . It also gives the asymptotic equivalence of Wald, score, and LRT tests under the null and under contiguous alternatives, and it is the foundation of semiparametric efficiency theory (Bickel-Klaassen-Ritov-Wellner 1993): in semiparametric models the analogous expansion identifies the efficient score as the projection of the parametric score onto the orthogonal complement of the nuisance tangent space.
Failure Mode
LAN fails for non-regular models. Canonical examples include parameter-dependent support (e.g., Uniform, where the MLE converges at rate rather than and the limit is exponential), boundary parameter problems (parameter on the edge of , where the limit is a censored normal), change-point models (where the rate is and the limit is a compound Poisson functional), mixture models at non-identified configurations, and unidentified parameters generally. Note that the standard Cauchy location family is regular: its Fisher information equals and LAN holds, so the location MLE is regular and asymptotically normal at the rate. The Cauchy sample mean is pathological because the Cauchy has no first moment, but likelihood-based location inference is fine. Many machine-learning "models" with high-dimensional parameters whose interior is unclear also fall outside LAN. The corresponding theory is LAMN (locally asymptotically mixed normal) and LAQ (locally asymptotically quadratic), with weaker optimality conclusions.
The Wald, Score, and Likelihood-Ratio Tests
Asymptotic Equivalence of Wald, Score, and LRT
Statement
Under the null and LAN, define the three test statistics:
- Wald. .
- Score (Rao / Lagrange multiplier). .
- Likelihood ratio. .
Then under , all three converge to the same chi-squared limit:
Moreover, and , so the three tests are asymptotically equivalent under and under contiguous alternatives (where the common limit becomes a non-central with non-centrality ).
Intuition
The three tests measure "distance from " in three different but asymptotically equivalent ways. Wald measures distance in the parameter space (how far is from , weighted by the information). Score measures the slope of the log-likelihood at (is the gradient near zero, or far?). LRT measures the depth of the log-likelihood (how much does the maximum exceed the null value?). All three reduce to the same quadratic form in the central sequence under LAN, and the chi-squared limit is the distribution of the squared length of a standard normal -vector.
Proof Sketch
Substitute the LAN expansion into each statistic and Taylor-expand to second order around . The MLE satisfies , so
which matches exactly. For the LRT, expand around to second order; the cross-term contributions cancel and the remainder is also . Since under , the quadratic form has a limit. See van der Vaart Asymptotic Statistics Theorem 16.7.
Why It Matters
Justifies reporting any of the three test statistics in practice with the same chi-squared reference distribution. Picks among them are finite-sample concerns: Wald requires the MLE and an estimate of ; score does not require fitting the alternative model and is preferred in some computational settings; LRT is invariant under parameter transformations and tends to have better small-sample behavior in nested-model comparisons. The non-central chi-squared limit under contiguous alternatives gives the local power calculation: power , a function of the non-centrality .
Failure Mode
The three tests can disagree substantially in finite samples, particularly when is small or the model is misspecified. Wald is parameterization-dependent (Wald CIs for and for are not invariant under reparameterization, while LRT CIs are). Under misspecification only the robust score test with a sandwich denominator preserves a valid chi-squared limit; the naive Wald and LRT statistics use the wrong variance. Under boundary conditions the limit is a mixture of chi-squared distributions, not a single (Self-Liang 1987).
Definitions: Influence Functions, Efficiency, Contiguity
Asymptotically Linear Estimator and Influence Function
A regular estimator of is asymptotically linear with influence function if and only if
where and . The asymptotic variance of is then .
For an M-estimator with estimating function , . For the MLE, , the efficient influence function. Influence functions parameterize the entire space of regular estimators in a parametric model, and they are the central object of semiparametric efficiency theory.
Asymptotic Relative Efficiency
The asymptotic relative efficiency (ARE) of estimator relative to , when both are -consistent and asymptotically normal with variances and , is
In this asymptotic-variance sense, is more efficient than if and only if . The MLE achieves relative to the Cramér-Rao bound by construction (under regularity). For testing, Pitman ARE is the analogous notion based on local power, and it equals the estimation ARE for tests derived from estimators (a striking equivalence due to Pitman 1949 and Noether 1955).
Contiguity
Two sequences of probability measures and on possibly different sample spaces are contiguous, written , if and only if for every sequence of measurable sets , implies . They are mutually contiguous if both and .
Under LAN, and are mutually contiguous for every fixed . Le Cam's first lemma characterizes this in terms of the limiting log-likelihood ratio: contiguity holds iff the likelihood ratio is uniformly integrable under . The third lemma converts statements about -distributions into statements about -distributions via the joint limit of .
Worked Examples
Delta method: variance-stabilizing transform for Poisson
If , then . The variance depends on , so a confidence interval on requires a plug-in estimate. Apply the delta method with : , so
The variance is now constant. Confidence intervals on do not need a plug-in step; transform back at the end. The same idea gives the Anscombe transform for finite- correction and the Freeman-Tukey for ratios of Poissons.
Poisson(λ) before and after the Anscombe transform; the right panel keeps variance near 1/4 regardless of λ
Left: raw Poisson samples have variance λ (mean equals variance). Right: the Anscombe transform √(Y + 3/8) yields variance close to 1/4 across λ. Slide λ and watch the raw variance scale up while the transformed variance stays put.
MLE for the exponential rate, with confidence interval
If (density ), the MLE is . The log-likelihood is , so and Fisher information is . By asymptotic normality,
A naive 95% CI is . A better CI uses the variance-stabilizing transform : , giving the symmetric-on-the-log-scale CI , which is always positive.
Sandwich variance for misspecified MLE
Suppose you fit an exponential model to data that are actually Gamma with shape . The "MLE" converges in probability to (the least-false parameter), not to anything intrinsic to the Gamma. The naive variance estimate uses the misspecified information identity and is generally wrong. The sandwich variance is
For Gamma (so that ), , giving . The naive exponential-model variance would have given , which is times too large when and too small when . This is the standard "robust SEs are different from model-based SEs under misspecification" phenomenon.
Common Confusions
Asymptotic normality is not a property of the estimator
Asymptotic normality is a property of the sequence , not of the estimator at any fixed sample size. The actual finite-sample distribution of an MLE can be heavily skewed, multimodal, or have heavy tails, even when the asymptotic distribution is a clean normal. Bootstrap or exact methods should be used for small samples. The asymptotic approximation is a tool, not a fact about at .
Efficiency is asymptotic; finite-sample winners can be biased
The MLE is asymptotically efficient under regularity, but finite-sample efficiency is a different question. James-Stein (1961) showed that for estimating a multivariate normal mean with , the MLE is dominated by a biased shrinkage estimator in mean squared error for every . The JS estimator is asymptotically equivalent to the MLE (both have variance ), but it strictly improves finite-sample MSE by trading bias for variance.
Convergence in distribution does not imply convergence of moments
does not imply or . The classical counterexample is for : (since with probability ), but for every . The fix is uniform integrability: if is uniformly integrable, then convergence in distribution implies convergence of expectations. In practice, asymptotic-normality results give the limiting distribution; whether has a moment converging to requires a separate argument.
Le Cam regularity (DQM) is weaker than classical regularity
Pointwise differentiability of in , combined with bounded-by-integrable-envelope conditions, is the textbook "regularity" assumption (Lehmann-Casella, Cox-Hinkley). Le Cam's differentiability in quadratic mean (DQM) is weaker: it requires only that be differentiable in as a function of . Models with bounded support (Uniform on is not DQM at because the support changes; but on for fixed support is DQM despite non-smooth densities at the endpoints) and many semiparametric models satisfy DQM without satisfying the textbook conditions. Modern asymptotic theory works at the DQM level; classical regularity is a sufficient-but-not-necessary special case.
Super-efficient estimators exist but only on a measure-zero set
Hodges (1951) constructed an estimator that is asymptotically normal with variance for every and asymptotic variance at . This violates the Cramér-Rao bound at the single point . The resolution: the Hodges estimator is not regular (its asymptotic distribution depends discontinuously on contiguous perturbations ). Le Cam's convolution theorem says: among regular estimators (those with continuous-in- asymptotic distributions), no estimator can beat . Super-efficiency is a measure-theoretic curiosity, not a practical estimation tool, and the Hodges estimator has terrible local risk near zero (the local-minimax risk over a -neighborhood is unbounded).
MLE consistency requires more than just asymptotic normality of the score
The standard "expand the score equation around " derivation of asymptotic normality assumes . Establishing consistency is a separate (often harder) step, requiring either (a) compactness of + continuity of + identifiability, or (b) Wald's classical consistency proof using uniform LLN, or (c) the modern empirical-process approach via Glivenko-Cantelli plus a strict separation between and competitors. Skipping the consistency step and "deriving asymptotic normality" from the score equation alone is a circular argument.
Summary
- Convergence calculus: CMT, Slutsky, delta method (incl. multivariate and second-order). These are the algebraic moves that let you turn one CLT into many applied confidence intervals.
- M-/Z-estimators: is asymptotically normal with sandwich variance , where is the derivative of the population estimating function and is its variance. MLE, OLS, GMM, quantile regression, and Huber-M are special cases.
- MLE asymptotics: under correct specification and regularity, . The information identity collapses the sandwich.
- LAN (Le Cam): every regular parametric problem looks like a Gaussian shift experiment at the scale. This gives the asymptotic minimax bound and the Wald/score/LRT equivalence.
- Wald, score, LRT: all three converge to under and to non-central under contiguous alternatives. Choose among them on finite-sample, robustness, and computational grounds.
- Influence functions parameterize the entire space of regular estimators; the efficient influence function is .
Exercises
Problem
Let and . Use the multivariate delta method to find the asymptotic distribution of the log-odds , then construct a 95% confidence interval for the log-odds and transform back to a CI for that respects .
Problem
Compute the influence function and asymptotic variance of Huber's robust location estimator with tuning constant , defined by
Specialize to . What is the ARE of Huber-M relative to the sample mean as ? As ?
Problem
Show that under the LAN expansion, the Wald statistic and the score statistic differ by under , and that both equal the LRT statistic .
Problem
Consider the change-point model for and for , with unknown change point and known . Show that the MLE of converges at rate (not ) and identify the limiting distribution.
Problem
In the LAN expansion, the experiment at local scale looks like observing . (a) Derive the optimal estimator of and its risk in this Gaussian shift experiment, and recover the Cramér-Rao bound. (b) For a semiparametric model with parametric component and infinite-dimensional nuisance , what is the analogue of that controls the asymptotic variance of any regular estimator of ?
References
- van der Vaart, Asymptotic Statistics (Cambridge, 1998). Chapters 2-3 (convergence calculus, delta method), Chapter 5 (M- and Z-estimators), Chapters 7-8 (local asymptotic normality, contiguity, convolution theorem), Chapter 16 (Wald/score/LRT equivalence). The standard graduate text and the source most cited above.
- Lehmann & Casella, Theory of Point Estimation, 2nd ed. (Springer, 1998). Chapter 6 covers MLE asymptotic theory under classical regularity. Less general than van der Vaart but gives more finite-sample finesse.
- Le Cam, Asymptotic Methods in Statistical Decision Theory (Springer, 1986). The original treatise on contiguity, LAN, and the convolution theorem. Dense and difficult, but the source.
- Le Cam & Yang, Asymptotics in Statistics: Some Basic Concepts, 2nd ed. (Springer, 2000). A more readable distillation of Le Cam's program; Chapter 6 covers DQM and LAN.
- Pollard, Convergence of Stochastic Processes (Springer, 1984). Chapters 4-5 develop the empirical-process tools (Glivenko-Cantelli, Donsker, stochastic equicontinuity) underlying modern M-estimator theory.
- Bickel, Klaassen, Ritov, Wellner, Efficient and Adaptive Estimation for Semiparametric Models (Springer, 1993). The foundational text on semiparametric efficiency, projection-onto-tangent-spaces, and the efficient influence function.
- Wasserman, All of Statistics (Springer, 2004). Chapters 9-10 give a fast-paced overview suitable as a refresher; not a substitute for van der Vaart for a stats PhD reading list, but useful for re-grounding before exams.
- Keener, Theoretical Statistics: Topics for a Core Course (Springer, 2010). Chapters 7-9 cover MLE asymptotics with worked examples; well-suited to a first-year stats PhD course.
- Tsiatis, Semiparametric Theory and Missing Data (Springer, 2006). Modern semiparametric efficiency aimed at causal-inference and missing-data applications. Chapter 3 is the cleanest introduction to influence-function calculus in print.
- van der Laan & Rose, Targeted Learning (Springer, 2011). TMLE and the modern double-robust / influence-function-based estimation framework; reads as a sequel to BKRW for the causal-inference era.
- Wood, Generalized Additive Models, 2nd ed. (Chapman & Hall, 2017). Chapter 6 has a practical treatment of REML and Wald/score/LRT in mixed models; good complement to the abstract theory above.
Next Topics
- Bootstrap methods: resampling as a finite-sample proxy for the asymptotic distribution; second-order accurate for smooth functionals.
- Fisher information and Cramér-Rao bound: the geometric and information-theoretic perspective on the asymptotic variance lower bound.
- Empirical processes and chaining: the modern technical machinery for M-estimator asymptotics with non-smooth and high-dimensional parameter spaces.
- Semiparametric efficiency and TMLE: the infinite-dimensional generalization of LAN underlying modern causal-inference methods.
Last reviewed: April 25, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
6- Central Limit Theoremlayer 0B · tier 1
- Cramér-Rao Bound: Information Inequality, Achievability, and Sharper Variantslayer 0B · tier 1
- Fisher Information: Curvature, KL Geometry, and the Natural Gradientlayer 0B · tier 1
- Maximum Likelihood Estimation: Theory, Information Identity, and Asymptotic Efficiencylayer 0B · tier 1
- Modes of Convergence of Random Variableslayer 0B · tier 1
Derived topics
5- Delta Methodlayer 1 · tier 1
- Bootstrap Methodslayer 2 · tier 1
- Likelihood-Ratio, Wald, and Score Testslayer 2 · tier 1
- Double/Debiased Machine Learninglayer 3 · tier 1
- Empirical Processes and Chaininglayer 3 · tier 2
Graph-backed continuations