Skip to main content

Knowledge Distillation

4 selectedDifficulty 4-64 unseenView topic
IntermediateNew
0 answered
4 intermediateAdapts to your performance
Question 1 of 4
120sintermediate (4/10)conceptual
Knowledge distillation (Hinton et al. 2015) trains a small 'student' model to match a large 'teacher' model's outputs. Why use soft teacher probabilities instead of just hard labels?