6

在一些 OpenAI 健身房环境中,有一个“ram”版本。例如:Breakout-v0Breakout-ram-v0

使用Breakout-ram-v0,每个观察都是一个长度为 128 的数组。

问题:如何将观察Breakout-v0(这是一个 160 x 210 图像)转换为观察形式Breakout-ram-v0(长度为 128 的数组)?

我的想法是在环境上训练模型并显示训练后的模型使用环境Breakout-ram-v0播放。Breakout-v0

4

4 回答 4

14

There's a couple ways of understanding the ram option.

Let's say you wanted to learn pong. If you train from the pixels, you'll likely use a convolutional net of several layers. interestingly, the final output of the convnet is a a 1D array of features. These you pass to a fully connected layer and maybe output the correct 'action' based on the features the convnet recognized in the image(es). Or you might use a reinforcement layer working on the 1D array of features.

Now let's say it occurs to you that pong is very simple, and could probably be represented in a 16x16 image instead of 160x160. straight downsampling doesn't give you enough detail, so you use openCV to extract the position of the ball and paddles, and create your mini version of 16x16 pong. with nice, crisp pixels. The computation needed is way less than your deep net to represent the essence of the game, and your new convnet is nice and small. Then you realize you don't even need your convnet any more. you can just do a fully connected layer to each of your 16x16 pixels.

So, think of what you have. Now you have 2 different ways of getting a simple representation of the game, to train your fully-connected layer on. (or RL algo)

  1. your deep convnet goes through several layers and outputs a 1D array, say of 256 features in the final layer. you pass that to the fully connected layer.
  2. your manual feature extraction extracts the blobs (pattles/ball) with OpenCV, to make a 16x16 pong. by passing that to your fully connected layer, it's really just a set of 16x16=256 'extracted features'.

So the pattern is that you find a simple way to 'represent' the state of the game, then pass that to your fully connected layers.

Enter option 3. The RAM of the game may just be a 256 byte array. But you know this contains the 'state' of the game, so it's like your 16x16 version of pong. it's most likely a 'better' representation than your 16x16 because it probably has info about the direction of the ball etc.

So now you have 3 different ways to simplify the state of the game, in order to train your fully connected layer, or your reinforcment algorithm.

So, what OpenAI has done by giving you the RAM is helping you avoid the task of learning a 'representation' of the game, and that let's you move directly to learning a 'policy' or what to do based on the state of the game.

OpenAI may provide a way to 'see' the visual output on the ram version. If they don't, you could ask them to make that available. But that's the best you will get. They are not going to reverse engineer the code to 'render' the RAM, nor are they going to reverse engineer the code to 'generate' 'RAM' based on pixels, which is not actually possible, since pixels are only part of the state of the game.

They simply provide the ram if it's easily available to them, so that you can try algorithms that learn what to do assuming there is something giving them a good state representation.

There is no (easy) way to do what you asked, as in translate pixels to RAM, but most likely there is a way to ask the Atari system to give you both the ram, and the pixels, so you can work on ram but show pixels.

于 2017-08-11T17:07:57.593 回答
2

我的想法是在 Breakout-ram-v0 上训练一个模型,并使用 Breakout-v0 环境显示训练后的模型。

类似于 erosten 的回答:如果您的环境是

env = gym.make('Breakout-ram-v0')
env.reset()

你想要像素,你正在寻找

pixels = env.unwrapped._get_image()
于 2020-01-17T18:43:16.267 回答
2

虽然上述答案在强化学习策略方面是正确的,并且无法直接将 ram 转换为图像或反之亦然,但要从图像环境中获取 ram 状态,您可以使用

# this is an image based environment
env = gym.make('Breakout-v0')
env.reset()

# put in the 0 action 
observation_image, reward, done, info = env.step(0)

# get the ram observation with the code below
observation_ram = env.unwrapped._get_ram()
于 2019-03-15T01:26:47.267 回答
1

您可以简单地使用 Atari 的 ram 环境进行训练,并调用wrappers对象自动保存训练后的视频。

import gym
from gym import wrappers
env = gym.make('SpaceInvaders-ram-v0')
env = wrappers.Monitor(env, "/path/to/folder/", force=True)
class(Policy):
    "Do your thing"


train_function() #call your train function
于 2020-05-11T15:55:57.467 回答