Lecture 22: Reinforcement Learning - A Modern Approach to AI

Listen — slide 1 Captions (VTT)

Lecture 22: Reinforcement Learning¶

AIMA Chapter 22 — 1 hour¶

Listen — slide 2 Captions (VTT)

Learning Objectives¶

Define reinforcement learning (no model)
Implement passive RL: ADP, TD
Implement active RL: Q-learning
Handle exploration vs. exploitation
Apply deep RL

Listen — slide 3 Captions (VTT)

RL vs. MDP¶

MDP: Model known (P, R)
RL: Model unknown, learn from experience
Goal: Find optimal policy

Listen — slide 4 Captions (VTT)

Passive RL¶

Policy fixed: π given
Task: Learn V^π or Q^π
Direct utility: Average returns
ADP: Learn model, then value iteration

Listen — slide 5 Captions (VTT)

Temporal-Difference Learning¶

TD(0): V(s) ← V(s) + α[r + γV(s’) - V(s)]
No model: Update from experience
Bootstrap: Use V(s’) as estimate

Listen — slide 6 Captions (VTT)

Active RL¶

Choose actions: Exploration
ε-greedy: Random with prob ε
Q-learning: Off-policy, learn Q*

Listen — slide 7 Captions (VTT)

Q-Learning¶

Update: Q(s,a) ← Q(s,a) + α[r + γ max_a’ Q(s’,a’) - Q(s,a)]
Off-policy: Learn optimal while following exploratory policy
Convergence: To Q* under conditions

Listen — slide 8 Captions (VTT)

Exploration¶

Exploration-exploitation tradeoff
ε-greedy: Simple
UCB: Optimistic initialization
Safe exploration: Constrained

Listen — slide 9 Captions (VTT)

Deep RL¶

Q-network: Approximate Q(s,a) with neural net
DQN: Experience replay, target network
Policy gradient: Directly optimize policy

Listen — slide 10 Captions (VTT)

Summary¶

Passive: Learn V given π
Active: Learn π*
TD, Q-learning: Model-free
Deep RL: Function approximation

Listen — slide 11 Captions (VTT)

References¶

AIMA Ch. 22
Russell & Norvig, AIMA 4e, Ch. 22
Chapter PDF: chapters/chapter-22.pdf
aima-python: reinforcement_learning4e.ipynb

Listen — slide 12 Captions (VTT)

Questions?¶

Next lecture: Natural Language Processing (Chapter 23)