Skip to main content
Theorem
Path
Curriculum
Paths
Labs
Diagnostic
Case Study
Blog
Search
Sign in
Quiz Hub
/
Mixture of Experts
Mixture of Experts
4 selected
Difficulty 4-7
4 unseen
View topic
Intermediate
New
0 answered
3 intermediate
1 advanced
Adapts to your performance
Question 1 of 4
120s
intermediate (4/10)
conceptual
Mixture of Experts (MoE) architectures route each input to a subset of specialized expert networks. What is the main efficiency advantage?
Hide and think first
A.
Routing per token bypasses the quadratic attention computation in transformers, making the overall architecture linear in input sequence length.
B.
Inactive expert parameters are paged out to disk and loaded only when the router selects them, reducing total resident GPU memory at inference.
C.
A small fraction of parameters fire per token (e.g., 2 of 8 experts), so total parameter count scales while per-token compute stays roughly constant.
D.
Experts specialize by input domain automatically during pretraining, eliminating the need for separate fine-tuning datasets per task or domain.
Submit Answer
I don't know