4

I am looking into the REINFORCE algorithm for reinforcement learning. I am having trouble understanding how rewards should be computed.

The algorithm from Sutton & Barto: enter image description here

What does G, 'return from step t' mean here?

  1. Return from step t to step T-1, i.e. R_t + R_(t+1) + ... + R_(T-1)?
  2. Return from step 0 to step t?, i.e. R_0 + R_1 + ... + R_(t)?
Atuos
  • 317
  • 1
  • 2
  • 7

2 Answers2

3

What does G, 'return from step t' mean here?

  1. Return from step t to step T-1, i.e. R_t + R_(t+1) + ... + R_(T-1)?
  2. Return from step 0 to step t?, i.e. R_0 + R_1 + ... + R_(t)?

Neither, but (1) is closest.

$$G_t = \sum_{i=t+1}^T R_i$$

i.e. the sum of all rewards from step $t+1$ to step $T$.

You are possibly confused because the loop for REINFORCE goes from $0$ to $T-1$. However, that makes sense due to the one step offset from return to the sum of rewards. So $G_{T-1} = R_T$ and $G_{T} = 0$ always (there is no future reward possible at the end of the episode).

Neil Slater
  • 28,338
  • 4
  • 77
  • 100
2

From the latest version of the book, where G is explicitly defined, and similar to Neil Slater's answer, $G_t \leftarrow$ return from step $t$ is:

$$ G_t = \sum_{k=t+1}^T \gamma^{k-t-1}R_k $$

Stephen Rauch
  • 1,783
  • 11
  • 21
  • 34