Skip to main content
Theorem
Path
Curriculum
Paths
Labs
Diagnostic
Case Study
Blog
Search
Sign in
Quiz Hub
/
Optimizer Theory: SGD, Adam, and Muon
Optimizer Theory: SGD, Adam, and Muon
4 selected
Difficulty 3-7
4 unseen
View topic
Foundation
New
0 answered
1 foundation
3 advanced
Adapts to your performance
Question 1 of 4
120s
foundation (3/10)
conceptual
Why can Adam make early training look easier than plain SGD?
Hide and think first
A.
Adam removes gradient noise by always computing full-batch gradients
B.
Adam uses the validation set inside the optimizer, so it directly optimizes generalization
C.
Adam proves convergence to the global minimum for any neural network objective
D.
Adam rescales coordinates using gradient history, so badly scaled directions can train more easily
Show Hint
Submit Answer
I don't know