Questions tagged [dqn]

The DQN (Deep Q-Network) algorithm was developed by DeepMind. It was able to solve a wide range of Atari games (some to superhuman level) by combining reinforcement learning and deep neural networks at scale

DQN overcomes unstable learning by mainly 4 techniques.

  1. Experience Replay
  2. Target Network
  3. Clipping Rewards
  4. Skipping Frames

Experience Replay:

Experience Replay is originally proposed in Reinforcement Learning for Robots Using Neural Networks in 1993. DNN is easily overfitting current episodes. Once DNN is overfitted, it’s hard to produce various experiences. To solve this problem, Experience Replay stores experiences including state transitions, rewards and actions, which are necessary data to perform Q learning, and makes mini-batches to update neural networks. This technique expects the following merits. reduces correlation between experiences in updating DNN increases learning speed with mini-batches reuses past transitions to avoid catastrophic forgetting

86 questions
11
votes
2 answers

How does Implicit Quantile-Regression Network (IQN) differ from QR-DQN?

For several months I browsed the internet hoping to find a user-friendly explanation of the Implicit Quantile Regression Network (IQN). But, it seems there is none at all. How does IQN differ from Quantile Regression Network, in plain language? In…
Kari
  • 2,686
  • 1
  • 17
  • 47
10
votes
2 answers

what is difference between the DDQN and DQN?

I think I did not understand what is the difference between DQN and DDQN in implementation. I understand that we change the traget network during the running of DDQN but I do not understand how it is done in this code. We put the…
7
votes
1 answer

What are the effects of clipping the reward in stability?

I am looking for stabilizing my results of DQN, I found clipping is one technique to do it but I did not understand it completely! 1- what are the effects of clipping the reward, clipping the gradient, clipping the error in stability and how makes…
user10296606
  • 1,784
  • 5
  • 17
  • 31
7
votes
3 answers

Why random sample from replay for DQN?

I'm trying to gain an intuitive understanding of deep reinforcement learning. In deep Q-networks (DQN) we store all actions/environments/rewards in a memory array and at the end of the episode, "replay" them through our neural network. This makes…
6
votes
2 answers

How to choose between discounted reward and average reward?

How to select between average reward and discounted reward? And when average reward is more effective in comparison with discounter reward and when vice versa is correct? Is is possible to use both of them in a problem? Because as I understand the…
user10296606
  • 1,784
  • 5
  • 17
  • 31
5
votes
1 answer

Difference between advantages of Experience Replay in DQN2013 paper

I've been re-reading the Playing Atari with Deep Reinforcement Learning (2013) paper. It lists three advantages of experience replay: This approach has several advantages over standard online Q-learning [23]. First, each step of experience is…
4
votes
2 answers

Agent always takes a same action in DQN - Reinforcement Learning

I have trained an RL agent using DQN algorithm. After 20000 episodes my rewards are converged. Now when I test this agent, the agent is always taking the same action , irrespective of state. I find this very weird. Can someone help me with this. Is…
chink
  • 555
  • 9
  • 17
4
votes
1 answer

How to implement clipping the reward in DQN in keras

How to implement clipping the reward in DQN in keras? especially how to implement clipping the reward? Is this pseudo code correct: if reward<-threshold reward=-1 elseif reward>threshold reward=1 elseif -threshold
user10296606
  • 1,784
  • 5
  • 17
  • 31
4
votes
1 answer

Why does exploration in DQN not lead to instability?

Why does action exploration in DQN not lead to instability? I see in DQN algorithms, that it selects random actions even after some iterations. My question is how does this approach not lead to instability? Even the final value of epsilon (the…
user10296606
  • 1,784
  • 5
  • 17
  • 31
4
votes
1 answer

What is a minimal setup to solve the CartPole-v0 with DQN?

I solved the CartPole-v0 with a CEM agent pretty easily (experiments and code), but I struggle to find a setup which works with DQN. Do you know which parameters should be adjusted so that the mean reward is about 200 for this problem? What I…
Martin Thoma
  • 18,630
  • 31
  • 92
  • 167
3
votes
2 answers

How exactly does DQN learn?

I created my custom environment in gym, which is a maze. I use a DQN model with BoltzmannQPolicy. It trains fine with the following variables: position of the agent distance from the endpoint position of the endpoint which directions can it move…
3
votes
1 answer

Is it possible to solve Rubik's cube using DQN?

I'm trying to solve Rubik's cube using deep learning and I came across with DQN, so I decided to give it a try. I developed all the code and started training but I got this results: Loss goes up and test never get better results. I have tried to…
3
votes
1 answer

Evaluating a trained Reinforcement Learning Agent?

I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So my question here is how do I evaluate a trained RL agent. Consider for a regression or…
3
votes
1 answer

Deep reinforcement learning on changing data sizes

I have a game that I want to build a model that will learn to play the game. Yet, the environment output is two lists that represent the location and number of soldiers of the user and Opponent. The lists lengths are changing with each step, so to…
3
votes
1 answer

Difference between Dueling DQN and Double DQN?

I have read some articles, but still can not figure out the difference between the Dueling DQN and Double DQN? What exactly is the difference between them? Also, Does Dueling DQN need to be built on top of a Double DQN? Thanks!
Edamame
  • 2,705
  • 5
  • 23
  • 32
1
2 3 4 5 6