What can I infer if large negative penalties are not increasing?

Asked Sep 14 '22 at 09:43

Active Sep 14 '22 at 10:40

Viewed 16 times

I am running a Deep RL algorithm. I defined a custom reward function. I run the algorithm for at least 500 epochs.

For each epoch, I am printing the total reward received by the actor-network. It is around $- 10^5$ for the first epoch. After the completion of 500 epochs, the value is almost similar.

I ran with the hope that after enough epochs, the value decreases and move toward zero. But my guess went wrong.

What can I infer from his? Is my actor learning? If yes, then why do the total reward values not decrease?

Note that I want to infer the learning of my actor network from total reward per epoch only and not from its actions in the environment.

edited Sep 14 '22 at 10:40

asked Sep 14 '22 at 09:43

hanugm

What can I infer if large negative penalties are not increasing?

0 Answers0