4

I am trying to use a reinforcement learning solution in an OpenAI Gym environment that has 6 discrete actions with continuous values, e.g. increase parameter 1 with 2.2, decrease parameter 1 with 1.6, decrease parameter 3 with 1 etc.

I have seen in this code that such an action space was implemented as a continuous space where the first value is approximated to discrete values (e.g. 0 if it is < 1 or 2 if it is < 2 and > 1).

Does anybody know if the above solution is the correct way to implement such an action space? Or does Gym offer another way?

Cristian M
  • 177
  • 1
  • 7
  • You could use a `spaces.Tuple` to do something similar. One space to control the discrete part and one space to control the continuous part. I don't think there is a "right" or "wrong" answer, as it depends on what exactly your environment is doing – chappers Apr 18 '20 at 22:26
  • @chappers Acoording to the description I provided, it seems that using a Tuple to create an action space composed of both discrete and continuous parts is more appropriate. However, as I understand from you, the approach I described is also useful. Can you provide an example how to create a Tuple space as you described? – Cristian M Apr 20 '20 at 09:14

1 Answers1

6

Here is a sample environment which demonstrates this. It relies on the environment to successfully filter out the correct continuous control element

import gym
from gym.spaces import Dict, Discrete, Box, Tuple
import numpy as np


class SampleGym(gym.Env):
    def __init__(self, config={}):
        self.config = config
        self.action_space = Tuple((Discrete(2), Box(-10, 10, (2,))))
        self.observation_space = Box(-10, 10, (2, 2))
        self.p_done = config.get("p_done", 0.1)

    def reset(self):
        return self.observation_space.sample()

    def step(self, action):
        chosen_action = action[0]
        cnt_control = action[1][chosen_action]

        if chosen_action == 0:
            reward = cnt_control
        else:
            reward = -cnt_control - 1

        print(f"Action, {chosen_action} continuous ctrl {cnt_control}")
        return (
            self.observation_space.sample(),
            reward,
            bool(np.random.choice([True, False], p=[self.p_done, 1.0 - self.p_done])),
            {},
        )


if __name__ == "__main__":
    env = SampleGym()
    env.reset()
    env.step((1, [-1, 2.1]))  # should say use action 1 with 2.1
    env.step((0, [-1.1, 2.1]))  # should say use action 0 with -1.1
```
chappers
  • 261
  • 1
  • 4
  • Do you know which ready-to-use agents support a tuple action space? I tried with stable baselines 3 but none of their models seem to support it – Simon Sep 23 '22 at 15:32