我正在实现以下通常在 OpenAI 的 Gym 中用于跳帧的包装器。它可以在 dqn/atari_wrappers.py 中找到
我对以下行感到非常困惑:
max_frame = np.max(np.stack(self._obs_buffer), axis=0)
我在代码中为我理解的部分添加了注释,以帮助任何可能提供帮助的人。
np.stack(self._obs_buffer)
将两个状态堆叠在_obs_buffer
.
np.max
返回沿轴 0 的最大值。
但我不明白我们为什么要这样做,或者它到底在做什么。
class MaxAndSkipEnv(gym.Wrapper):
"""Return only every 4th frame"""
def __init__(self, env=None, skip=4):
super(MaxAndSkipEnv, self).__init__(env)
# Initialise a double ended queue that can store a maximum of two states
self._obs_buffer = deque(maxlen=2)
# _skip = 4
self._skip = skip
def _step(self, action):
total_reward = 0.0
done = None
for _ in range(self._skip):
# Take a step
obs, reward, done, info = self.env.step(action)
# Append the new state to the double ended queue buffer
self._obs_buffer.append(obs)
# Update the total reward by summing the (reward obtained from the step taken) + (the current
# total reward)
total_reward += reward
# If the game ends, break the for loop
if done:
break
max_frame = np.max(np.stack(self._obs_buffer), axis=0)
return max_frame, total_reward, done, info