Skip to main content
Theorem
Path
Curriculum
Paths
Labs
Diagnostic
Case Study
Blog
Search
Sign in
Quiz Hub
/
Grokking
Grokking
3 selected
Difficulty 4-6
3 unseen
View topic
Intermediate
New
0 answered
3 intermediate
Adapts to your performance
Question 1 of 3
120s
intermediate (4/10)
state theorem
Grokking (Power et al. 2022) is a surprising training phenomenon. What happens?
Hide and think first
A.
Different layers of the network achieve different test accuracies during training, producing a hierarchical 'grokked' structure across the depth.
B.
The model overfits the training data permanently, and 'grokking' is the name given to that irreversible memorization plateau in modern transformers.
C.
The model abruptly loses all training accuracy late in optimization, after which only re-initialization restores any useful behavior on the task.
D.
Train accuracy hits 100% early, but test accuracy stays near random for many more epochs before suddenly jumping — long delayed generalization.
Submit Answer
I don't know