Unlock: Flash Attention
IO-aware exact attention: tile QKV matrices into SRAM-sized blocks so the full N-by-N attention matrix is never materialized in HBM. Peak activation memory drops from O(N²) to O(N); HBM read/write traffic drops by a large constant factor (not asymptotic linearity); FLOP count is unchanged.
185 Prerequisites0 Mastered0 Working147 Gaps
Prerequisite mastery21%
Recommended probe
Chernoff Bounds is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.
Flash AttentionTARGET
Not assessed5 questions
Softmax and Numerical StabilityFoundations
Not assessed11 questions
Attention Mechanism TheoryResearch
Not assessed11 questions
No quiz
GPU Compute ModelFrontier
Not assessed3 questions
CUDA Programming FundamentalsResearch
No quiz
NVIDIA GPU ArchitecturesFrontier
No quiz
Sign in to track your mastery and see personalized gap analysis.