RL Notes

Reinforcement Learning: An Introduction — Sutton & Barto, 2nd edition

Chapter 1

1.1 What is RL 2.2 Rewards Returns and Episodes 3.3 Value Functions and Bellman Equations 4.4 RL Solution Methods Taxonomy

Chapter 2

1.1 Bandit Problem and Action Values 2.2 Epsilon Greedy and Incremental Updates 3.3 Optimistic Init and UCB 4.4 Gradient Bandits 5.5 Contextual Bandits and Summary

Chapter 3

1.1 MDP Formalism 2.2 Returns and Value Functions 3.3 Optimality and Approximation

Chapter 4

1.1 Policy Evaluation 2.2 Policy Improvement and Iteration 3.3 Value Iteration 4.4 Async DP and GPI Summary

Chapter 5

1.1 MC Prediction 2.2 MC Control 3.3 Off Policy MC and Importance Sampling 4.4 MC Summary and Comparison

Chapter 6

1.1 TD Prediction TD0 2.2 SARSA On Policy TD Control 3.3 Q Learning Off Policy TD 4.4 TD Analysis and Optimality

Chapter 7

1.1 N Step TD Prediction 2.2 GAE and TD Lambda Preview 3.3 Per Decision Methods and Summary

Chapter 8

1.1 Dyna Architecture 2.2 Planning Models and MCTS

Chapter 9

1.1 Value Function Approximation 2.2 Linear TD Convergence

Chapter 10

1.1 On Policy Control with Approximation

Chapter 11

1.1 Off Policy Methods with Approximation

Chapter 12

1.1 Eligibility Traces

Chapter 13

1.1 Policy Gradient Theorem 2.2 Actor Critic and PPO

Chapter 14

1.1 Psychology of RL

Chapter 15

1.1 Neuroscience of RL

Chapter 16

1.1 Applications and Case Studies

Chapter 17

1.1 Frontiers of RL