Skip to main content
← Choose a different target

Unlock: Online Learning and Bandits

Sequential decision making with adversarial or stochastic feedback: the bandit setting, explore-exploit tradeoff, UCB, Thompson sampling, and regret bounds. Connections to RL and A/B testing.

169 Prerequisites0 Mastered0 Working137 Gaps
Prerequisite mastery19%
Recommended probe

McDiarmid's Inequality is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.

McDiarmid's InequalityAdvancedWEAKEST
Not assessed13 questions
Not assessed2 questions
Not assessed3 questions
Not assessed58 questions
Not assessed1 question
Not assessed10 questions
Not assessed5 questions

Sign in to track your mastery and see personalized gap analysis.