RL Notes

Reinforcement Learning: An Introduction — Sutton & Barto, 2nd edition


Chapter 1
1.1 What is RL 2.2 Rewards Returns and Episodes 3.3 Value Functions and Bellman Equations 4.4 RL Solution Methods Taxonomy
Chapter 2
1.1 Bandit Problem and Action Values 2.2 Epsilon Greedy and Incremental Updates 3.3 Optimistic Init and UCB 4.4 Gradient Bandits 5.5 Contextual Bandits and Summary
Chapter 3
1.1 MDP Formalism 2.2 Returns and Value Functions 3.3 Optimality and Approximation
Chapter 4
1.1 Policy Evaluation 2.2 Policy Improvement and Iteration 3.3 Value Iteration 4.4 Async DP and GPI Summary
Chapter 5
1.1 MC Prediction 2.2 MC Control 3.3 Off Policy MC and Importance Sampling 4.4 MC Summary and Comparison
Chapter 6
1.1 TD Prediction TD0 2.2 SARSA On Policy TD Control 3.3 Q Learning Off Policy TD 4.4 TD Analysis and Optimality
Chapter 7
1.1 N Step TD Prediction 2.2 GAE and TD Lambda Preview 3.3 Per Decision Methods and Summary
Chapter 8
1.1 Dyna Architecture 2.2 Planning Models and MCTS
Chapter 9
1.1 Value Function Approximation 2.2 Linear TD Convergence
Chapter 10
1.1 On Policy Control with Approximation
Chapter 11
1.1 Off Policy Methods with Approximation
Chapter 12
1.1 Eligibility Traces
Chapter 13
1.1 Policy Gradient Theorem 2.2 Actor Critic and PPO
Chapter 14
1.1 Psychology of RL
Chapter 15
1.1 Neuroscience of RL
Chapter 16
1.1 Applications and Case Studies
Chapter 17
1.1 Frontiers of RL