RL Notes
Reinforcement Learning: An Introduction — Sutton & Barto, 2nd edition
Chapter 1
1.1 What is RL
2.2 Rewards Returns and Episodes
3.3 Value Functions and Bellman Equations
4.4 RL Solution Methods Taxonomy
Chapter 2
1.1 Bandit Problem and Action Values
2.2 Epsilon Greedy and Incremental Updates
3.3 Optimistic Init and UCB
4.4 Gradient Bandits
5.5 Contextual Bandits and Summary
Chapter 3
Chapter 4
1.1 Policy Evaluation
2.2 Policy Improvement and Iteration
3.3 Value Iteration
4.4 Async DP and GPI Summary
Chapter 5
1.1 MC Prediction
2.2 MC Control
3.3 Off Policy MC and Importance Sampling
4.4 MC Summary and Comparison
Chapter 6
1.1 TD Prediction TD0
2.2 SARSA On Policy TD Control
3.3 Q Learning Off Policy TD
4.4 TD Analysis and Optimality
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17