Chapter 3 Solutions
Reinforcement Learning: An Introduction
1 3.12
\[ v_\pi = \mathbb{E}_\pi[G_t|S_t = s] = \sum_{a \in \mathcal{A}} \pi(a|S_t=s) \mathbb{E}_\pi[G_t|S_t = s, A_t=a] = \sum_{a \in \mathcal{A}} \pi(a|S_t=s) q_\pi(s,a) \]
2 3.13
\[\begin{align*} q_\pi = \mathbb{E}_\pi[G_t|S_t = s, A_t=a] = \int_{S'} \int_{R} G_t p(s',r|s,a)dsdr = \int_{S'} \int_{R} (r_t + \gamma G_{t+1})p(s',r|s,a) dsdr \\ = \int_{S'} \int_{R} r_t p(s',r|s,a) + \gamma G_{t+1} p(s',r|s,a)dsdr \end{align*}\]