Numerical Optimization
Trust Region Methods
Trust region methods minimize a local model only inside a region where that model is trusted, then grow or shrink the region using actual versus predicted decrease. They are robust Newton-style methods for difficult non-convex landscapes.
Why This Matters
Newton's method is fast when its quadratic model is accurate and dangerous when it is not. A bad Newton step can be huge, point uphill, or chase a saddle because the Hessian is indefinite.
Trust region methods fix the failure mode by asking a more honest question:
How far from the current point do I actually trust this local model?
Instead of choosing a direction and searching along it, a trust region method chooses the step by minimizing the model inside a ball. Then it compares predicted improvement to actual improvement and updates the radius.
Mental Model
A line search method says: "I picked a direction. How far should I go?"
A trust region method says: "I trust my model inside this radius. What is the best step inside it?"
If the model predicts reality well, the region expands. If the model lies, the region shrinks. This is why trust regions are robust for non-convex second-order optimization: the radius becomes a feedback signal about local model quality.
Formal Setup
Quadratic Model
At iterate , define
where and is either the Hessian or an approximation.
Trust Region Subproblem
The trust region step solves
The radius limits how far the algorithm trusts the model.
Actual-To-Predicted Reduction Ratio
After proposing , compute
If , the model predicted the objective well. If is small or negative, the model was unreliable at that radius.
The Algorithm
- Build from , , and .
- Approximately solve the trust region subproblem.
- Compute .
- Accept the step if is large enough.
- Shrink after poor prediction; grow it after accurate prediction and boundary-hitting steps.
A typical policy:
| Condition | Step decision | Radius decision |
|---|---|---|
| reject | shrink | |
| accept cautiously | keep similar | |
| and | accept | expand |
| reject | shrink aggressively |
The exact constants are implementation choices. The logic is the important part.
Solving The Subproblem
Trust Region Subproblem Optimality Conditions
Statement
A global solution to the quadratic trust region subproblem satisfies the existence of such that
and
Intuition
If the unconstrained Newton step is inside the ball and is positive definite, . Otherwise the constraint is active and shifts the Hessian until the constrained quadratic is well behaved.
Why It Matters
This is the exact mathematical bridge between Newton, regularized Newton, Levenberg-Marquardt, and trust region methods.
Exact subproblem solves are often unnecessary. Common approximations:
| Solver | Best for | Idea |
|---|---|---|
| Cauchy point | cheap guaranteed progress | steepest descent clipped to the region |
| Dogleg | positive definite Hessian, small problems | path from gradient step to Newton step |
| Steihaug-CG | large-scale Hessian-vector products | conjugate gradient until boundary or negative curvature |
| Exact secular equation | small/medium high-accuracy solves | solve for the multiplier |
Cauchy Decrease
Cauchy Point Sufficient Decrease
Statement
The Cauchy point satisfies
up to standard constant conventions.
Intuition
Even the cheapest trust region step must make meaningful predicted progress: either move to the boundary in the negative-gradient direction or take the best one-dimensional step along that direction.
Why It Matters
Most global convergence proofs only require that the approximate subproblem solver achieves at least a fixed fraction of Cauchy decrease. You do not need to solve the subproblem exactly to get robust theory.
Global Convergence
Global First-Order Convergence
Statement
A standard trust region method satisfying the assumptions has
Under stronger assumptions and standard acceptance rules, every accumulation point is first-order stationary.
Intuition
If the gradient stayed large forever, accepted steps would have to decrease by a nontrivial amount. That cannot continue forever because is bounded below. If steps are rejected repeatedly, the radius shrinks until the Taylor model becomes accurate enough to accept a step.
Proof Sketch
Combine Cauchy decrease with model accuracy for small . Rejected steps force smaller; small radii make approach under smoothness; then steps must be accepted. Summing accepted decreases contradicts a gradient bounded away from zero.
Failure Mode
First-order stationarity is not the same as a good minimum. A method can still approach saddle-like stationary points unless the subproblem solver and acceptance logic exploit negative curvature.
Negative Curvature
If has a negative eigenvalue, the quadratic model decreases along that direction. A line search Newton method may need special safeguards to avoid unstable or ascent directions. Trust region methods naturally cap the step while still allowing movement along negative curvature to the boundary.
This is one reason trust region methods are important in non-convex optimization: the ball prevents catastrophic step lengths, while the subproblem can still use curvature information that gradient descent ignores.
Trust Region Vs Line Search
| Question | Line search | Trust region |
|---|---|---|
| What is chosen first? | direction | local region and model |
| What is adjusted? | step length | radius |
| How is quality measured? | sufficient decrease/Wolfe conditions | actual versus predicted decrease |
| Negative curvature | needs safeguards | handled inside subproblem |
| Cost per iteration | cheaper direction plus function checks | subproblem solve |
| Strong use case | scalable first/second-order hybrids | robust Newton-like methods |
RL Connection: TRPO And PPO
Trust Region Policy Optimization uses a KL constraint instead of a Euclidean radius:
The idea is the same: do not let the new policy move outside the region where the local surrogate objective is trustworthy. PPO is best understood as a simpler clipped-surrogate approximation to this trust-region idea, not as the exact same constrained optimization method.
Common Confusions
The trust region is not a constraint on the final answer
The region constrains the current step , not the optimization problem's final solution. The radius is a local reliability control.
Rejected steps are useful information
A rejected step says the model was not predictive at that radius. Shrinking the radius is not failure; it is the mechanism that restores model validity.
The Cauchy point is not the best step
The Cauchy point is a cheap lower bar for progress. Dogleg, truncated CG, and exact solvers can produce better steps, but they should at least clear the Cauchy-decrease floor.
Trust region does not make non-convex optimization globally solved
The method improves robustness and convergence to stationary points. It does not remove bad local minima, saddle geometry, noise, or modeling error.
Q&A For Mastery
Why not always use a line search? Line search methods choose a direction first. If the direction is bad because the Hessian is indefinite or poorly conditioned, searching along it can still be poor. Trust regions choose direction and length together.
Why does matter more than raw loss decrease? Raw decrease tells you the step helped. tells you whether the model is reliable enough to justify larger future steps.
What should I log? Log , , accepted/rejected status, , and whether the subproblem hit the boundary or negative curvature.
What To Remember
- Trust region methods optimize a model inside a radius.
- compares actual reduction to predicted reduction.
- Cauchy decrease is the minimal progress certificate.
- Dogleg and Steihaug-CG are practical subproblem solvers.
- Trust regions are robust for Newton-like non-convex optimization, but they still guarantee stationarity, not magic global optimality.
Exercises
Problem
Suppose , , and . What is the Cauchy point and predicted reduction?
Problem
If the Newton step is inside the trust region and is positive definite, what is the trust region step?
Problem
Why is a KL trust region more meaningful than a Euclidean parameter trust region for policy optimization?
References
Canonical:
- Nocedal and Wright, Numerical Optimization, 2nd ed., Chapter 4.
- Conn, Gould, and Toint, Trust Region Methods (2000).
- More and Sorensen, "Computing a Trust Region Step" (1983).
Further:
- Steihaug, "The Conjugate Gradient Method and Trust Regions in Large Scale Optimization" (1983).
- Schulman et al., "Trust Region Policy Optimization" (2015).
- Curtis and Robinson, "Exploiting Negative Curvature in Deterministic and Stochastic Optimization" (2019).
Next Topics
- Second-order optimization methods: Newton, Gauss-Newton, Hessian-free, and curvature-aware ML.
- Interior point methods: constrained optimization through barriers rather than local trust balls.
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
5- The Hessian Matrixlayer 0A · tier 1
- Newton's Methodlayer 1 · tier 1
- Augmented Lagrangian and ADMMlayer 2 · tier 2
- Conjugate Gradient Methodslayer 2 · tier 2
- Line Search Methodslayer 2 · tier 2
Derived topics
2- Second-Order Optimization Methodslayer 3 · tier 2
- Interior Point Methodslayer 3 · tier 3
Graph-backed continuations