Highest Voted 'policy-gradients' Questions - Data Science Stack Exchange

23

votes

2 answers

Formal proof of vanilla policy gradient convergence

So I stumbled upon this question, where the author asks for a proof of vanilla policy gradient procedures. The answer provided points to some literature, but the formal proof is nowhere to be included. Looking at Sutton,Barto- Reinforcement…

reinforcement-learning mathematics policy-gradients

asked Jun 15 '19 at 16:58

Markus Peschl

280
1
7

6

votes

1 answer

Reinforcement Learning: Policy Gradient derivation question

I have been reading this excellent post: https://medium.com/@jonathan_hui/rl-policy-gradients-explained-9b13b688b146 and following the RL-videos by David Silver, and I did not get this thing: For $\pi_\theta(\tau) = \pi_\theta(s_1, a_1, ..., s_T,…

reinforcement-learning policy-gradients

asked Feb 17 '20 at 15:00

Hadamard

63
4

5

votes

1 answer

RL's policy gradient (REINFORCE) pipeline clarification

I try to build a policy gradient RL machine, and let's look at the REINFORCE's equation for updating the model parameters by taking a gradient to make the ascent (I apologize if notation is slightly non-conventional): $$\omega = \omega + \alpha…

reinforcement-learning policy-gradients

asked Sep 19 '18 at 17:18

Alexey Burnakov

233
2
11

5

votes

1 answer

Policy Gradients - gradient Log probabilities favor less likely actions?

Assume we work with neural networks, with the policy gradients method. The gradient w.r.t to the objective function $J$, is an expectation. In other words, to get this gradient $\nabla_{\theta} J(\theta)$, we sample N trajectories, then average out…

backpropagation policy-gradients

asked Sep 11 '18 at 07:46

Kari

2,686
1
17
47

4

votes

2 answers

Agent always takes a same action in DQN - Reinforcement Learning

I have trained an RL agent using DQN algorithm. After 20000 episodes my rewards are converged. Now when I test this agent, the agent is always taking the same action , irrespective of state. I find this very weird. Can someone help me with this. Is…

reinforcement-learning dqn policy-gradients actor-critic

asked Oct 04 '19 at 15:02

chink

555
9
17

4

votes

2 answers

Reinforcement learning: Discounting rewards in the REINFORCE algorithm

I am looking into the REINFORCE algorithm for reinforcement learning. I am having trouble understanding how rewards should be computed. The algorithm from Sutton & Barto: What does G, 'return from step t' mean here? Return from step t to step T-1,…

reinforcement-learning policy-gradients

asked Sep 13 '18 at 12:27

Atuos

317
1
2
7

4

votes

1 answer

Time horizon T in policy gradients (actor-critic)

I am currently going through the Berkeley lectures on Reinforcement Learning. Specifically, I am at slide 5 of this lecture. At the bottom of that slide, the gradient of the expected sum of rewards function is given by $$ \nabla J(\theta) =…

machine-learning deep-learning reinforcement-learning policy-gradients actor-critic

asked Aug 28 '18 at 14:56

Dummie Variable

86
2

4

votes

1 answer

Policy-based RL method - how do continuous actions look like?

I've read several times that Policy-based RL methods can work with continuous action space (move left 5 meters, move right 5.5312 meters), rather than with discrete actions, like Value-based methods (Q-learning) If Policy-based methods produce…

reinforcement-learning policy-gradients

asked Aug 26 '18 at 00:20

Kari

2,686
1
17
47

4

votes

1 answer

How does action get selected in a Policy Gradient Method?

As I understood, in Reinforcement-Learning a big difference between a Value-based method and a Policy-gradient method is how the next action is selected. In Q-learning (Value-based method), each possible action gets a score. We then select next…

policy-gradients

asked Aug 20 '18 at 22:19

Kari

2,686
1
17
47

3

votes

1 answer

Which Policy Gradient Method was used by Google's Deep Mind to teach AI to walk

I just saw this video on Youtube. Which Policy Gradient method was used to train the AI to walk? Was it DDPG or D4PG or what?

machine-learning deep-learning reinforcement-learning policy-gradients deepmind

asked Apr 10 '21 at 12:10

learner

33
2

3

votes

1 answer

Policy Gradient not "learning"

I'm attempting to implement the policy gradient taken from the "Hands-On Machine Learning" book by Geron, which can be found here. The notebook uses Tensorflow and I'm attempting to do it with PyTorch. My models look as follows: model =…

reinforcement-learning pytorch implementation policy-gradients

asked May 12 '20 at 20:24

Harpal

903
1
7
13

3

votes

1 answer

Maximum Entropy Policy Gradient Derivation

I am reading through the paper on Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review by Sergey Levine. I am having a difficulty in understanding this part of the derivation on Maximum Entropy Policy Gradients (Section…

machine-learning reinforcement-learning gradient-descent policy-gradients derivation

asked Dec 11 '19 at 14:03

Ricky Sanjaya

39
3

3

votes

1 answer

reinforcement learning: Decompose a policy gradient

I am studying the policy gradient through the website: https://towardsdatascience.com/understanding-actor-critic-methods-931b97b6df3f Couldn't figure out how the first equation becomes the second equation? In the second equation, why the first…

reinforcement-learning policy-gradients

asked Dec 10 '19 at 23:07

Edamame

2,705
5
23
32

3

votes

1 answer

Policy Gradients vs Value function, when implemented via DQN

After studying Q-learning, Sarsa & DQN I've now discovered a term "Policy Gradients". It's a bit unclear to me how it differs to the above approaches. Here is my understanding, please correct it: From the moment I first encountered DQN, I always…

reinforcement-learning policy-gradients

asked Jul 18 '18 at 07:09

Kari

2,686
1
17
47

2

votes

1 answer

Policy Gradient with continuous action space

How to apply reinforce/policy-gradient algorithms for continuous action space. I have learnt that one of the advantages of policy gradients is , it is applicable for continuous action space. One way I can think of is discretizing the action space…

reinforcement-learning dqn policy-gradients ai

asked Oct 14 '19 at 11:51

chink

555
9
17

Questions tagged [policy-gradients]