有没有办法实现 OpenAI 的环境,每一步的动作空间都会发生变化?
问问题
3223 次
2 回答
7
是的(尽管某些预制代理可能在这种情况下不起作用)。
@property
def action_space(self):
# Do some code here to calculate the available actions
return Something
装饰器使您可以适应健身房环境的@property
标准格式,其中 action_space 是属性env.action_space
而不是方法env.action_space()
。
于 2017-08-15T03:00:16.093 回答
0
您可以实现自己的 Space 后代类并覆盖 shape()、sample() 和 contains() 方法以返回与更新的可用操作一致的值。然后,您的环境会返回 action_space 的自定义类的实例,您可以在每一步从环境中更新该实例。
这可以通过您提供的其他方法来完成,例如 disable_actions() 和 enable_actions(),如下所示:
import gym import numpy as np #You could also inherit from Discrete or Box here and just override the shape(), sample() and contains() methods class Dynamic(gym.Space): """ x where x in available actions {0,1,3,5,...,n-1} Example usage: self.action_space = spaces.Dynamic(max_space=2) """ def __init__(self, max_space): self.n = max_space #initially all actions are available self.available_actions = range(0, max_space) def disable_actions(self, actions): """ You would call this method inside your environment to remove available actions""" self.available_actions = [action for action in self.available_actions if action not in actions] return self.available_actions def enable_actions(self, actions): """ You would call this method inside your environment to enable actions""" self.available_actions = self.available_actions.append(actions) return self.available_actions def sample(self): return np.random.choice(self.available_actions) def contains(self, x): return x in self.available_actions @property def shape(self): """"Return the new shape here"""" return () def __repr__(self): return "Dynamic(%d)" % self.n def __eq__(self, other): return self.n == other.n
您还可以限制代理中的操作,只允许它考虑有效操作,但这会阻碍现有通用代理的使用。
我发现这个链接解释得很好(这里引用太长了)我如何让 AI 知道在强化学习的特定状态下只有一些动作可用?
于 2017-08-27T16:33:30.903 回答