Applied Math
Kalman Filter
Optimal state estimation for linear Gaussian systems via recursive prediction and update steps using the Kalman gain.
Why This Matters
The Kalman filter is the exact minimum mean squared error (MMSE) estimator for linear Gaussian state-space models. It appears in GPS navigation, robotics, autonomous driving, financial time series, and any setting where you need to track a hidden state from noisy observations. Every extended or unscented variant builds on the same core recursion.
Mental Model
You have a system whose state evolves over time according to linear dynamics with Gaussian noise. At each time step, you receive a noisy measurement of the state. The Kalman filter maintains a Gaussian belief over the current state: a mean (best estimate) and covariance (uncertainty). Each step has two phases: predict (project the belief forward using the dynamics) and update (incorporate the new measurement to sharpen the estimate).
Formal Setup and Notation
Linear Gaussian State-Space Model
The state equation describes how the hidden state evolves:
The observation equation describes the measurement :
Here is the state transition matrix, is the control input matrix, is a known control input, is the observation matrix, is the process noise covariance, and is the measurement noise covariance. The noise sequences and are independent of each other and of the initial state .
Kalman Filter Prediction Step
Given the posterior at time , , the predicted state and covariance are:
This propagates the belief forward through the dynamics, increasing uncertainty by .
Kalman Filter Update Step
When measurement arrives, compute the innovation (measurement residual):
The innovation covariance is .
The Kalman gain is:
The updated state and covariance are:
Main Theorems
Optimality of the Kalman Filter
Statement
For a linear Gaussian state-space model, the Kalman filter posterior is the exact conditional distribution . The mean is the minimum mean squared error estimator:
and is the conditional covariance .
Intuition
Gaussians are closed under linear transformations and conditioning. Since the dynamics are linear and all noise is Gaussian, the joint distribution of is always Gaussian. Conditioning a Gaussian on observed values yields another Gaussian, and the conditional mean of a Gaussian is always the MMSE estimator.
Proof Sketch
Proceed by induction. The base case holds because is Gaussian. For the inductive step: if is Gaussian, then is a linear function of Gaussians, so is Gaussian (prediction step). The joint is then jointly Gaussian, and conditioning on gives as Gaussian (update step). The formulas for the conditional mean and covariance of a jointly Gaussian vector yield exactly the Kalman gain equations.
Why It Matters
This is one of the rare cases where the optimal Bayesian filter has an exact, finite-dimensional, closed-form recursion. For nonlinear or non-Gaussian systems, no such finite recursion exists, and all practical filters (EKF, UKF, particle filters) are approximations.
Failure Mode
The optimality guarantee breaks completely when any of its assumptions fail. If the dynamics are nonlinear, the posterior is no longer Gaussian and the Kalman recursion is only an approximation (EKF). If the noise is heavy-tailed, the MMSE estimator is no longer the conditional mean of a Gaussian, and the filter can diverge on outlier measurements. If the system matrices are misspecified, the filter is overconfident: underestimates the true uncertainty.
Extensions to Nonlinear Systems
Extended Kalman Filter (EKF)
For nonlinear dynamics and nonlinear observations , the EKF linearizes around the current estimate:
Then applies the standard Kalman recursion using in place of and in place of . This is a first-order Taylor approximation and can diverge if the nonlinearity is strong.
Unscented Kalman Filter (UKF)
The UKF avoids computing Jacobians. Instead, it propagates a set of deterministically chosen sigma points through the nonlinear functions and , then reconstructs the mean and covariance from the transformed points. For a state of dimension , the UKF uses sigma points. It captures second-order effects that the EKF misses, with the same cost per step.
Canonical Examples
Tracking position from noisy GPS
A vehicle moves in 1D with constant velocity model. State: . Dynamics:
Observation: GPS measures position only, so , .
With , , : the filter smooths out GPS noise while estimating velocity (which is never directly observed). After several steps, the position uncertainty drops well below because the velocity estimate provides additional information.
Sensor fusion: combining GPS and accelerometer
Same state as above, but now add an accelerometer measurement where is acceleration. Augment the state to include acceleration, or treat the accelerometer as a control input with known noise. The Kalman filter optimally fuses both sensors by weighting each measurement inversely proportional to its noise variance. This is the principle behind inertial navigation systems.
Connection to Recursive Least Squares
The Kalman filter is equivalent to recursive least squares (RLS) when the state is constant and the observation model is linear. This connection makes the Bayesian updating interpretation concrete.
Suppose the true parameter is , constant over time (, ), with observations where . The Kalman update step then becomes:
This is exactly the RLS recursion. The covariance plays the role of the inverse information matrix in batch ordinary least squares: it tracks how much uncertainty remains about given all observations so far.
The batch OLS solution can be recovered by initializing (a diffuse prior) and running the Kalman recursion until all observations have been processed, then taking . The resulting equals the batch OLS estimate; equals the OLS covariance matrix.
This Bayesian view of RLS is what the Kalman filter generalizes. The update has a clean interpretation: the new estimate is the prior mean shifted by the Kalman gain times the surprise (innovation). When the prior is much sharper than the observation noise (), the gain and the filter ignores the noisy observation. When the observation is much more precise than the prior () and happens to be square and invertible, the gain approaches and the filter trusts the measurement. In the general case is rectangular or rank-deficient (you only observe part of the state), and as the limiting gain is (a covariance-weighted pseudoinverse), which enforces the measurement constraint in the observed subspace and leaves the unobserved directions to be pinned down by the dynamics — no ordinary exists.
The expectation-variance-covariance structure underlying this tradeoff is the Bayesian posterior update for Gaussians: conditioning a Gaussian prior on a Gaussian likelihood yields a Gaussian posterior, with the mean shifting toward the observation by an amount proportional to the relative precision.
Common Confusions
The Kalman gain is not a tuning parameter
The Kalman gain is derived, not chosen. It is the unique gain that produces the MMSE estimate given the model. "Tuning" a Kalman filter means choosing and to match the actual noise statistics, not manually adjusting .
The Kalman filter does not require stationarity
The matrices can all be time-varying: . The recursion and optimality still hold. Stationarity is only needed if you want the covariance to converge to a steady state (solving the discrete algebraic Riccati equation).
EKF is not optimal for nonlinear systems
The EKF applies the Kalman equations to a linearized system. The result is not the true posterior , which is generally non-Gaussian for nonlinear dynamics. The EKF can diverge when the linearization is poor. The UKF and particle filters provide better approximations at higher cost.
Summary
- Predict step: project mean and covariance forward through dynamics
- Update step: correct using the Kalman gain
- Optimal (MMSE) only for linear Gaussian systems
- Covariance does not depend on the actual measurements, only on the model
- EKF linearizes nonlinear systems; UKF uses sigma points instead
- Model mismatch (wrong or ) causes overconfidence, not just bias
Exercises
Problem
A scalar state evolves as with . Observations are with . Starting from , , compute and given .
Problem
Show that the steady-state Kalman gain for the scalar system , with , satisfies . What happens as and as ?
References
Canonical:
- Anderson & Moore, Optimal Filtering (1979), Chapters 2-5
- Simon, Optimal State Estimation (2006), Chapters 5-7
Current:
- Sarkka, Bayesian Filtering and Smoothing (2013), Chapters 4-5
- Bishop, Pattern Recognition and Machine Learning (2006), Chapter 13.3
- Murphy, Machine Learning: A Probabilistic Perspective (2012), Chapter 18
- Kalman, "A New Approach to Linear Filtering and Prediction Problems" (Trans. ASME J. Basic Eng., vol. 82, no. 1, 1960, pp. 35-45)
- Julier & Uhlmann, "A New Extension of the Kalman Filter to Nonlinear Systems" (SPIE 1997)
- Bar-Shalom, Li, Kirubarajan, Estimation with Applications to Tracking and Navigation (2001), Chapters 5-6
Next Topics
- Particle filters: sequential Monte Carlo for nonlinear, non-Gaussian systems
- Hidden Markov models: discrete-state analogue of the Kalman filter
- GraphSLAM and factor graphs: batch optimization formulations that extend Kalman filtering to simultaneous localization and mapping
- Linear regression: the batch version of the online estimation problem the Kalman filter solves recursively
- Expectation, variance, and covariance: the Gaussian conditioning formulas that underlie the update step
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
2- Common Probability Distributionslayer 0A · tier 1
- Eigenvalues and Eigenvectorslayer 0A · tier 1
Derived topics
3- Bayesian State Estimationlayer 2 · tier 2
- State Space Modelslayer 2 · tier 2
- Particle Filterslayer 3 · tier 3
Graph-backed continuations