Unlock: Attention as Kernel Regression
Softmax attention viewed as Nadaraya-Watson kernel regression: the output at each position is a kernel-weighted average of values. Connects attention to classical nonparametric statistics and motivates linear attention via random feature approximations.
157 Prerequisites0 Mastered0 Working133 Gaps
Prerequisite mastery15%
Recommended probe
Basu's Theorem is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.
Not assessed1 question
Attention Mechanism TheoryResearch
Not assessed11 questions
Not assessed5 questions
Sign in to track your mastery and see personalized gap analysis.