Questions tagged [openai-gym]

56 questions
11
votes
1 answer

Why could my DDQN get significantly worse after beating the game repeatedly?

I've been trying to train a DDQN to play OpenAI Gym's CartPole-v1, but found that although it starts off well and starts getting full score (500) repeatedly (at around 600 episodes in the pic below), it then seems to go off the rails and do worse…
4
votes
1 answer

How to define discrete action space with continuous values in OpenAI Gym?

I am trying to use a reinforcement learning solution in an OpenAI Gym environment that has 6 discrete actions with continuous values, e.g. increase parameter 1 with 2.2, decrease parameter 1 with 1.6, decrease parameter 3 with 1 etc. I have seen in…
Cristian M
  • 177
  • 1
  • 7
4
votes
1 answer

What is a minimal setup to solve the CartPole-v0 with DQN?

I solved the CartPole-v0 with a CEM agent pretty easily (experiments and code), but I struggle to find a setup which works with DQN. Do you know which parameters should be adjusted so that the mean reward is about 200 for this problem? What I…
Martin Thoma
  • 18,630
  • 31
  • 92
  • 167
3
votes
2 answers

How exactly does DQN learn?

I created my custom environment in gym, which is a maze. I use a DQN model with BoltzmannQPolicy. It trains fine with the following variables: position of the agent distance from the endpoint position of the endpoint which directions can it move…
3
votes
1 answer

Valid actions in OpenAI Gym

Why don't the gym environments come with "valid actions"? The normal gym environment accepts as input any action, even if it's not even possible. Is this a normal thing in reinforcement learning? Do the models really have to learn what valid…
Muppet
  • 777
  • 1
  • 7
  • 13
3
votes
0 answers

Learn large, variable-size action space for Diplomacy game

I am making an environment using OpenAI gym for Diplomacy, and making an AI for it. In Diplomacy, a player has many units, and each unit has a number of moves available to it. Therefore, the player's action space is the product of each unit's moves,…
3
votes
0 answers

Reinforcement Learning using PPO2 in openai gym retro, mario not learning the clear the easy episode

I am training mario game in retro using ppo2 baselines for some time. I have tried level3 and level1 too. But even after full training when I play using saved checkpoints, the mario is not able to finish the level. Mostly it falls into the hole or…
Sandeep Bhutani
  • 884
  • 1
  • 7
  • 22
3
votes
1 answer

What is wrong with this reinforcement learning environment ?

I'm working on below reinforcement learning problem: I have bottle of fix capacity (say 5 liters). At the bottom of bottle there is cock to remove water. The distribution of removal of water is not fixed. we can remove any amount of water from…
2
votes
1 answer

OpenAI Gym equivalent for supervised and/or unsupervised learning

OpenAI Gym has really normalized the way reinforcement learning is performed. It makes it possible for data scientists to separate model development and environment setup/building and to focus on what they really should be focusing on. Quoting from…
Ali Hassaine
  • 129
  • 3
2
votes
1 answer

What does anneal mean in the context of machine learning?

An article released by Open AI gives an overview of how Open AI Five works. There is a paragraph in the article stating: Our agent is trained to maximize the exponentially decayed sum of future rewards, weighted by an exponential decay factor…
2
votes
1 answer

How is the target_f updated in the Keras solution to the Deep Q-learning Cartpole/Gym algorithm?

There's a popular solution to the CartPole game using Keras and Deep Q-Learning: https://keon.github.io/deep-q-learning/ But there's a line of code that's confusing, this same question has been asked in the same article and many people are confused…
2
votes
2 answers

openai gym - what is an agent I can use with a multi-discrete action space?

I have a custom environment with a multi-discrete action space. The action and observation spaces are as follows: Action: MultiDiscrete([ 3 121 121 121 3 121 121 121 3 121 121 121 3 121 121 121 3 121 121 121 3 121 121 121 3 121 121 121…
2
votes
1 answer

PPO, A2C for continuous action spaces, math and code

Edit: Question has been edited to better reflect what I learned after asking the original question. I implemented the clipped objective PPO-clip as explained here: https://spinningup.openai.com/en/latest/algorithms/ppo.html Basically I used a dummy…
mLstudent33
  • 574
  • 1
  • 4
  • 17
2
votes
1 answer

How to Form the Training Examples for Deep Q Network in Reinforcement Learning?

Trying to pick up basics of reinforcement learning by self-study from some blogs and texts. Forgive me if the question is too basic and different bits that I understand are a bit messy, but even after consulting a few references, I cannot really get…
Della
  • 315
  • 1
  • 3
  • 9
1
vote
1 answer

Effects of slipperiness in OpenAI FrozenLake Environment

I am trying to wrap my head around the effects of is_slippery in the open.ai FrozenLake-v0 environment. From my results when is_slippery=True which is the default value it is much more difficult to solve the environment compared to when…
yudhiesh
  • 213
  • 1
  • 9
1
2 3 4