Unlock: Online Learning and Bandits
Sequential decision making with adversarial or stochastic feedback: the bandit setting, explore-exploit tradeoff, UCB, Thompson sampling, and regret bounds. Connections to RL and A/B testing.
169 Prerequisites0 Mastered0 Working137 Gaps
Prerequisite mastery19%
Recommended probe
McDiarmid's Inequality is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.
Not assessed13 questions
Not assessed2 questions
Symmetrization InequalityAdvanced
Not assessed3 questions
VC DimensionCore
Not assessed58 questions
Contraction InequalityAdvanced
Not assessed1 question
Adaptive Learning Is Not IIDAdvanced
Not assessed10 questions
No-Regret LearningAdvanced
Not assessed5 questions
No quiz
Sign in to track your mastery and see personalized gap analysis.