Kalman Filter

Sneiderman, Robby

Applied Math

Kalman Filter

Optimal state estimation for linear Gaussian systems via recursive prediction and update steps using the Kalman gain.

CoreTier 1StableCore spine~55 min

Prerequisites

Common Probability Distributions Eigenvalues and Eigenvectors

Quiz (5)Pulse Check Prereq Map

Why This Matters

The Kalman filter is the exact minimum mean squared error (MMSE) estimator for linear Gaussian state-space models. It appears in GPS navigation, robotics, autonomous driving, financial time series, and any setting where you need to track a hidden state from noisy observations. Every extended or unscented variant builds on the same core recursion.

Mental Model

You have a system whose state evolves over time according to linear dynamics with Gaussian noise. At each time step, you receive a noisy measurement of the state. The Kalman filter maintains a Gaussian belief over the current state: a mean (best estimate) and covariance (uncertainty). Each step has two phases: predict (project the belief forward using the dynamics) and update (incorporate the new measurement to sharpen the estimate).

Formal Setup and Notation

Definition

Linear Gaussian State-Space Model $x_{t}, z_{t}$

The state equation describes how the hidden state $x_t \in \mathbb{R}^n$ evolves:

$x_{t+1} = A x_t + B u_t + w_t, \quad w_t \sim \mathcal{N}(0, Q)$

The observation equation describes the measurement $z_t \in \mathbb{R}^m$ :

$z_t = H x_t + v_t, \quad v_t \sim \mathcal{N}(0, R)$

Here $A$ is the state transition matrix, $B$ is the control input matrix, $u_t$ is a known control input, $H$ is the observation matrix, $Q$ is the process noise covariance, and $R$ is the measurement noise covariance. The noise sequences $\{w_t\}$ and $\{v_t\}$ are independent of each other and of the initial state $x_0 \sim \mathcal{N}(\hat{x}_0, P_0)$ .

Definition

Kalman Filter Prediction Step

Given the posterior at time $t$ , $x_t | z_{1:t} \sim \mathcal{N}(\hat{x}_{t|t}, P_{t|t})$ , the predicted state and covariance are:

$\hat{x}_{t+1|t} = A \hat{x}_{t|t} + B u_t$

$P_{t+1|t} = A P_{t|t} A^\top + Q$

This propagates the belief forward through the dynamics, increasing uncertainty by $Q$ .

Definition

Kalman Filter Update Step $K_{t}$

When measurement $z_{t+1}$ arrives, compute the innovation (measurement residual):

$y_{t+1} = z_{t+1} - H \hat{x}_{t+1|t}$

The innovation covariance is $S_{t+1} = H P_{t+1|t} H^\top + R$ .

The Kalman gain is:

$K_{t+1} = P_{t+1|t} H^\top S_{t+1}^{-1} = P_{t+1|t} H^\top (H P_{t+1|t} H^\top + R)^{-1}$

The updated state and covariance are:

$\hat{x}_{t+1|t+1} = \hat{x}_{t+1|t} + K_{t+1} y_{t+1}$

$P_{t+1|t+1} = (I - K_{t+1} H) P_{t+1|t}$

Main Theorems

Theorem

Optimality of the Kalman Filter

Statement

For a linear Gaussian state-space model, the Kalman filter posterior $\mathcal{N}(\hat{x}_{t|t}, P_{t|t})$ is the exact conditional distribution $p(x_t | z_{1:t})$ . The mean $\hat{x}_{t|t}$ is the minimum mean squared error estimator:

$\hat{x}_{t|t} = \arg\min_{g(z_{1:t})} \mathbb{E}[\|x_t - g(z_{1:t})\|^2 | z_{1:t}]$

and $P_{t|t}$ is the conditional covariance $\text{Cov}(x_t | z_{1:t})$ .

Intuition

Gaussians are closed under linear transformations and conditioning. Since the dynamics are linear and all noise is Gaussian, the joint distribution of $(x_t, z_{1:t})$ is always Gaussian. Conditioning a Gaussian on observed values yields another Gaussian, and the conditional mean of a Gaussian is always the MMSE estimator.

Proof Sketch

Proceed by induction. The base case holds because $x_0$ is Gaussian. For the inductive step: if $x_t | z_{1:t}$ is Gaussian, then $x_{t+1} = A x_t + B u_t + w_t$ is a linear function of Gaussians, so $x_{t+1} | z_{1:t}$ is Gaussian (prediction step). The joint $(x_{t+1}, z_{t+1}) | z_{1:t}$ is then jointly Gaussian, and conditioning on $z_{t+1}$ gives $x_{t+1} | z_{1:t+1}$ as Gaussian (update step). The formulas for the conditional mean and covariance of a jointly Gaussian vector yield exactly the Kalman gain equations.

Why It Matters

This is one of the rare cases where the optimal Bayesian filter has an exact, finite-dimensional, closed-form recursion. For nonlinear or non-Gaussian systems, no such finite recursion exists, and all practical filters (EKF, UKF, particle filters) are approximations.

Failure Mode

The optimality guarantee breaks completely when any of its assumptions fail. If the dynamics are nonlinear, the posterior is no longer Gaussian and the Kalman recursion is only an approximation (EKF). If the noise is heavy-tailed, the MMSE estimator is no longer the conditional mean of a Gaussian, and the filter can diverge on outlier measurements. If the system matrices $A, H, Q, R$ are misspecified, the filter is overconfident: $P_{t|t}$ underestimates the true uncertainty.

report a correction →

Extensions to Nonlinear Systems

Extended Kalman Filter (EKF)

For nonlinear dynamics $x_{t+1} = f(x_t, u_t) + w_t$ and nonlinear observations $z_t = h(x_t) + v_t$ , the EKF linearizes around the current estimate:

$F_t = \frac{\partial f}{\partial x}\bigg|_{\hat{x}_{t|t}}, \quad H_t = \frac{\partial h}{\partial x}\bigg|_{\hat{x}_{t+1|t}}$

Then applies the standard Kalman recursion using $F_t$ in place of $A$ and $H_t$ in place of $H$ . This is a first-order Taylor approximation and can diverge if the nonlinearity is strong.

Unscented Kalman Filter (UKF)

The UKF avoids computing Jacobians. Instead, it propagates a set of deterministically chosen sigma points through the nonlinear functions $f$ and $h$ , then reconstructs the mean and covariance from the transformed points. For a state of dimension $n$ , the UKF uses $2n + 1$ sigma points. It captures second-order effects that the EKF misses, with the same $O(n^3)$ cost per step.

Canonical Examples

Example

Tracking position from noisy GPS

A vehicle moves in 1D with constant velocity model. State: $x_t = [\text{position}, \text{velocity}]^\top$ . Dynamics:

$A = \begin{bmatrix} 1 & \Delta t \\ 0 & 1 \end{bmatrix}, \quad Q = q \begin{bmatrix} \Delta t^3/3 & \Delta t^2/2 \\ \Delta t^2/2 & \Delta t \end{bmatrix}$

Observation: GPS measures position only, so $H = [1, 0]$ , $R = \sigma_{\text{gps}}^2$ .

With $\Delta t = 1$ , $q = 0.1$ , $\sigma_{\text{gps}} = 5$ : the filter smooths out GPS noise while estimating velocity (which is never directly observed). After several steps, the position uncertainty $P_{11}$ drops well below $\sigma_{\text{gps}}^2$ because the velocity estimate provides additional information.

Example

Sensor fusion: combining GPS and accelerometer

Same state as above, but now add an accelerometer measurement $z_t^{(2)} = a_t + v_t^{(2)}$ where $a_t$ is acceleration. Augment the state to include acceleration, or treat the accelerometer as a control input $u_t = z_t^{(2)}$ with known noise. The Kalman filter optimally fuses both sensors by weighting each measurement inversely proportional to its noise variance. This is the principle behind inertial navigation systems.

Connection to Recursive Least Squares

The Kalman filter is equivalent to recursive least squares (RLS) when the state is constant and the observation model is linear. This connection makes the Bayesian updating interpretation concrete.

Suppose the true parameter is $\theta \in \mathbb{R}^d$ , constant over time ( $A = I$ , $Q = 0$ ), with observations $z_t = H_t \theta + v_t$ where $v_t \sim \mathcal{N}(0, R)$ . The Kalman update step then becomes:

$\hat{\theta}_{t+1} = \hat{\theta}_t + K_{t+1}(z_{t+1} - H_{t+1} \hat{\theta}_t)$

$K_{t+1} = P_t H_{t+1}^\top (H_{t+1} P_t H_{t+1}^\top + R)^{-1}$

$P_{t+1} = (I - K_{t+1} H_{t+1}) P_t$

This is exactly the RLS recursion. The covariance $P_t$ plays the role of the inverse information matrix $(X^\top X)^{-1}$ in batch ordinary least squares: it tracks how much uncertainty remains about $\theta$ given all observations so far.

The batch OLS solution $(X^\top X)^{-1} X^\top y$ can be recovered by initializing $P_0 = \lambda I$ (a diffuse prior) and running the Kalman recursion until all $n$ observations have been processed, then taking $\lambda \to \infty$ . The resulting $\hat{\theta}_n$ equals the batch OLS estimate; $P_n = \sigma^2 (X^\top X)^{-1}$ equals the OLS covariance matrix.

This Bayesian view of RLS is what the Kalman filter generalizes. The update $\hat{x}_{t|t} = \hat{x}_{t|t-1} + K_t (z_t - H \hat{x}_{t|t-1})$ has a clean interpretation: the new estimate is the prior mean shifted by the Kalman gain times the surprise (innovation). When the prior is much sharper than the observation noise ( $P_{t|t-1} \ll R$ ), the gain $K \to 0$ and the filter ignores the noisy observation. When the observation is much more precise than the prior ( $R \to 0$ ) and $H$ happens to be square and invertible, the gain approaches $H^{-1}$ and the filter trusts the measurement. In the general case $H$ is rectangular or rank-deficient (you only observe part of the state), and as $R \to 0$ the limiting gain is $K \to P_{t|t-1} H^\top (H P_{t|t-1} H^\top)^{+}$ (a covariance-weighted pseudoinverse), which enforces the measurement constraint in the observed subspace and leaves the unobserved directions to be pinned down by the dynamics — no ordinary $H^{-1}$ exists.

The expectation-variance-covariance structure underlying this tradeoff is the Bayesian posterior update for Gaussians: conditioning a Gaussian prior on a Gaussian likelihood yields a Gaussian posterior, with the mean shifting toward the observation by an amount proportional to the relative precision.

Common Confusions

Watch Out

The Kalman gain is not a tuning parameter

The Kalman gain $K_t$ is derived, not chosen. It is the unique gain that produces the MMSE estimate given the model. "Tuning" a Kalman filter means choosing $Q$ and $R$ to match the actual noise statistics, not manually adjusting $K$ .

Watch Out

The Kalman filter does not require stationarity

The matrices $A, H, Q, R$ can all be time-varying: $A_t, H_t, Q_t, R_t$ . The recursion and optimality still hold. Stationarity is only needed if you want the covariance $P_{t|t}$ to converge to a steady state (solving the discrete algebraic Riccati equation).

Watch Out

EKF is not optimal for nonlinear systems

The EKF applies the Kalman equations to a linearized system. The result is not the true posterior $p(x_t | z_{1:t})$ , which is generally non-Gaussian for nonlinear dynamics. The EKF can diverge when the linearization is poor. The UKF and particle filters provide better approximations at higher cost.

Summary

Predict step: project mean and covariance forward through dynamics
Update step: correct using the Kalman gain $K = P_{\text{pred}} H^\top (H P_{\text{pred}} H^\top + R)^{-1}$
Optimal (MMSE) only for linear Gaussian systems
Covariance $P_{t|t}$ does not depend on the actual measurements, only on the model
EKF linearizes nonlinear systems; UKF uses sigma points instead
Model mismatch (wrong $Q$ or $R$ ) causes overconfidence, not just bias

Exercises

ExerciseCore

Problem

A scalar state evolves as $x_{t+1} = x_t + w_t$ with $w_t \sim \mathcal{N}(0, 1)$ . Observations are $z_t = x_t + v_t$ with $v_t \sim \mathcal{N}(0, 4)$ . Starting from $\hat{x}_{0|0} = 0$ , $P_{0|0} = 1$ , compute $\hat{x}_{1|1}$ and $P_{1|1}$ given $z_1 = 3$ .

ExerciseAdvanced

Problem

Show that the steady-state Kalman gain for the scalar system $x_{t+1} = x_t + w_t$ , $z_t = x_t + v_t$ with $Q = q$ , $R = r$ satisfies $K_\infty = (\sqrt{q^2 + 4qr} - q)/(2r)$ . What happens as $q/r \to 0$ and as $q/r \to \infty$ ?

References

Canonical:

Anderson & Moore, Optimal Filtering (1979), Chapters 2-5
Simon, Optimal State Estimation (2006), Chapters 5-7

Current:

Sarkka, Bayesian Filtering and Smoothing (2013), Chapters 4-5
Bishop, Pattern Recognition and Machine Learning (2006), Chapter 13.3
Murphy, Machine Learning: A Probabilistic Perspective (2012), Chapter 18
Kalman, "A New Approach to Linear Filtering and Prediction Problems" (Trans. ASME J. Basic Eng., vol. 82, no. 1, 1960, pp. 35-45)
Julier & Uhlmann, "A New Extension of the Kalman Filter to Nonlinear Systems" (SPIE 1997)
Bar-Shalom, Li, Kirubarajan, Estimation with Applications to Tracking and Navigation (2001), Chapters 5-6

Next Topics

Particle filters: sequential Monte Carlo for nonlinear, non-Gaussian systems
Hidden Markov models: discrete-state analogue of the Kalman filter
GraphSLAM and factor graphs: batch optimization formulations that extend Kalman filtering to simultaneous localization and mapping
Linear regression: the batch version of the online estimation problem the Kalman filter solves recursively
Expectation, variance, and covariance: the Gaussian conditioning formulas that underlie the update step

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Common Probability Distributionslayer 0A · tier 1
Eigenvalues and Eigenvectorslayer 0A · tier 1

Derived topics

3

Bayesian State Estimationlayer 2 · tier 2
State Space Modelslayer 2 · tier 2
Particle Filterslayer 3 · tier 3

Graph-backed continuations

Particle Filters Bayesian State Estimation State Space Models