Skip to main content

Prerequisite chain

Prerequisites for Reinforcement Learning from Human Feedback

Topics you need before working through Reinforcement Learning from Human Feedback. Direct prerequisites are listed first; transitive prerequisites (the chain reachable through them) follow.

Direct prerequisites (4)

  1. Policy Gradient Theoremlayer 3, tier 1
  2. RLHF and Alignmentlayer 4, tier 2
  3. Reinforcement Learning for Synthesis Planninglayer 4, tier 3
  4. Reward Design and Reward Misspecificationlayer 3, tier 1

Reachable through the chain (260)

These topics are not directly cited as prerequisites but are reached transitively by following the chain upward. Working through the direct prerequisites pulls these in.