Unlock: Q-Learning

Model-free, off-policy value learning: the Q-learning update rule, convergence under Robbins-Monro conditions, and the deep Q-network revolution that introduced function approximation, experience replay, and the deadly triad.

255 Prerequisites0 Mastered0 Working196 Gaps

Prerequisite mastery23%

Recommended probe

Natural Language Processing Foundations is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.

Q-LearningTARGET

Natural Language Processing FoundationsCoreWEAKEST

Not assessed5 questions

Bellman EquationsCore

Not assessed12 questions

Value Iteration and Policy IterationCore

Not assessed6 questions

Stochastic Approximation TheoryCore

Not assessed3 questions

Temporal Difference LearningCore

Not assessed1 question