Questions tagged [openai-gym]
56 questions
11
votes
1 answer
Why could my DDQN get significantly worse after beating the game repeatedly?
I've been trying to train a DDQN to play OpenAI Gym's CartPole-v1, but found that although it starts off well and starts getting full score (500) repeatedly (at around 600 episodes in the pic below), it then seems to go off the rails and do worse…
Danny Tuppeny
- 213
- 2
- 7
4
votes
1 answer
How to define discrete action space with continuous values in OpenAI Gym?
I am trying to use a reinforcement learning solution in an OpenAI Gym environment that has 6 discrete actions with continuous values, e.g. increase parameter 1 with 2.2, decrease parameter 1 with 1.6, decrease parameter 3 with 1 etc.
I have seen in…
Cristian M
- 177
- 1
- 7
4
votes
1 answer
What is a minimal setup to solve the CartPole-v0 with DQN?
I solved the CartPole-v0 with a CEM agent pretty easily (experiments and code), but I struggle to find a setup which works with DQN.
Do you know which parameters should be adjusted so that the mean reward is about 200 for this problem?
What I…
Martin Thoma
- 18,630
- 31
- 92
- 167
3
votes
2 answers
How exactly does DQN learn?
I created my custom environment in gym, which is a maze. I use a DQN model with BoltzmannQPolicy.
It trains fine with the following variables:
position of the agent
distance from the endpoint
position of the endpoint
which directions can it move…
Marci
- 31
- 2
3
votes
1 answer
Valid actions in OpenAI Gym
Why don't the gym environments come with "valid actions"? The normal gym environment accepts as input any action, even if it's not even possible.
Is this a normal thing in reinforcement learning? Do the models really have to learn what valid…
Muppet
- 777
- 1
- 7
- 13
3
votes
0 answers
Learn large, variable-size action space for Diplomacy game
I am making an environment using OpenAI gym for Diplomacy, and making an AI for it.
In Diplomacy, a player has many units, and each unit has a number of moves available to it.
Therefore, the player's action space is the product of each unit's moves,…
Daniel Paczuski Bak
- 153
- 1
- 5
3
votes
0 answers
Reinforcement Learning using PPO2 in openai gym retro, mario not learning the clear the easy episode
I am training mario game in retro using ppo2 baselines for some time. I have tried level3 and level1 too. But even after full training when I play using saved checkpoints, the mario is not able to finish the level. Mostly it falls into the hole or…
Sandeep Bhutani
- 884
- 1
- 7
- 22
3
votes
1 answer
What is wrong with this reinforcement learning environment ?
I'm working on below reinforcement learning problem:
I have bottle of fix capacity (say 5 liters). At the bottom of bottle there is cock to remove
water. The distribution of removal of water is not fixed. we can remove any amount of water from…
Krishna Nevase
- 65
- 6
2
votes
1 answer
OpenAI Gym equivalent for supervised and/or unsupervised learning
OpenAI Gym has really normalized the way reinforcement learning is performed. It makes it possible for data scientists to separate model development and environment setup/building and to focus on what they really should be focusing on.
Quoting from…
Ali Hassaine
- 129
- 3
2
votes
1 answer
What does anneal mean in the context of machine learning?
An article released by Open AI gives an overview of how Open AI Five works. There is a paragraph in the article stating:
Our agent is trained to maximize the exponentially decayed sum of future rewards, weighted by an exponential decay factor…
Reuben Walker
- 21
- 2
2
votes
1 answer
How is the target_f updated in the Keras solution to the Deep Q-learning Cartpole/Gym algorithm?
There's a popular solution to the CartPole game using Keras and Deep Q-Learning:
https://keon.github.io/deep-q-learning/
But there's a line of code that's confusing, this same question has been asked in the same article and many people are confused…
David Goudet
- 23
- 2
2
votes
2 answers
openai gym - what is an agent I can use with a multi-discrete action space?
I have a custom environment with a multi-discrete action space.
The action and observation spaces are as follows:
Action:
MultiDiscrete([ 3 121 121 121 3 121 121 121 3 121 121 121 3 121 121 121 3 121
121 121 3 121 121 121 3 121 121 121…
Daniel Paczuski Bak
- 153
- 1
- 5
2
votes
1 answer
PPO, A2C for continuous action spaces, math and code
Edit:
Question has been edited to better reflect what I learned after asking the original question.
I implemented the clipped objective PPO-clip as explained here: https://spinningup.openai.com/en/latest/algorithms/ppo.html
Basically I used a dummy…
mLstudent33
- 574
- 1
- 4
- 17
2
votes
1 answer
How to Form the Training Examples for Deep Q Network in Reinforcement Learning?
Trying to pick up basics of reinforcement learning by self-study from some blogs and texts. Forgive me if the question is too basic and different bits that I understand are a bit messy, but even after consulting a few references, I cannot really get…
Della
- 315
- 1
- 3
- 9
1
vote
1 answer
Effects of slipperiness in OpenAI FrozenLake Environment
I am trying to wrap my head around the effects of is_slippery in the open.ai FrozenLake-v0 environment.
From my results when is_slippery=True which is the default value it is much more difficult to solve the environment compared to when…
yudhiesh
- 213
- 1
- 9