Unlock: Flash Attention

IO-aware exact attention: tile QKV matrices into SRAM-sized blocks so the full N-by-N attention matrix is never materialized in HBM. Peak activation memory drops from O(N²) to O(N); HBM read/write traffic drops by a large constant factor (not asymptotic linearity); FLOP count is unchanged.

185 Prerequisites0 Mastered0 Working147 Gaps

Prerequisite mastery21%

Recommended probe

Chernoff Bounds is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.

Flash AttentionTARGET

Bennett's InequalityCore

No quiz

Chernoff BoundsFoundationsWEAKEST

Not assessed3 questions

Chi-Squared ConcentrationCore

No quiz

Hoeffding's LemmaFoundations