openai-gym - 有没有办法实现 OpenAI 的环境，每一步的动作空间都会发生变化？

Question

有没有办法实现 OpenAI 的环境，每一步的动作空间都会发生变化？

score 7 · Accepted Answer

是的（尽管某些预制代理可能在这种情况下不起作用）。

@property
def action_space(self):
    # Do some code here to calculate the available actions
    return Something

装饰器使您可以适应健身房环境的@property标准格式，其中 action_space 是属性env.action_space而不是方法env.action_space()。

score 0 · Accepted Answer

您可以实现自己的 Space 后代类并覆盖 shape()、sample() 和 contains() 方法以返回与更新的可用操作一致的值。然后，您的环境会返回 action_space 的自定义类的实例，您可以在每一步从环境中更新该实例。

这可以通过您提供的其他方法来完成，例如 disable_actions() 和 enable_actions()，如下所示：

import gym
import numpy as np

#You could also inherit from Discrete or Box here and just override the shape(), sample() and contains() methods
class Dynamic(gym.Space):
"""
x where x in available actions {0,1,3,5,...,n-1}
Example usage:
self.action_space = spaces.Dynamic(max_space=2)
"""

def __init__(self, max_space):
    self.n = max_space

    #initially all actions are available
    self.available_actions = range(0, max_space)

def disable_actions(self, actions):
    """ You would call this method inside your environment to remove available actions"""
    self.available_actions = [action for action in self.available_actions if action not in actions]
    return self.available_actions

def enable_actions(self, actions):
    """ You would call this method inside your environment to enable actions"""
    self.available_actions = self.available_actions.append(actions)
    return self.available_actions

def sample(self):
    return np.random.choice(self.available_actions)

def contains(self, x):
    return x in self.available_actions

@property
def shape(self):
""""Return the new shape here""""
    return ()

def __repr__(self):
    return "Dynamic(%d)" % self.n

def __eq__(self, other):
    return self.n == other.n

您还可以限制代理中的操作，只允许它考虑有效操作，但这会阻碍现有通用代理的使用。

我发现这个链接解释得很好（这里引用太长了）我如何让 AI 知道在强化学习的特定状态下只有一些动作可用？

openai-gym - 有没有办法实现 OpenAI 的环境，每一步的动作空间都会发生变化？

2 回答 2

Related

Reference