random - 如何使用 OpenAi-Gym 和 Scoop 产生可重现的随机性？

Question

如何使用 OpenAi-Gym 和 Scoop 产生可重现的随机性？

每次重复该示例时，我都希望得到完全相同的结果。如果可能的话，我希望它与使用随机提供者（例如随机和 np.random）的现有库一起使用，这可能是一个问题，因为它们通常使用全局随机状态并且不为局部随机状态提供接口

我的示例脚本如下所示：

import random
import numpy as np
from scoop import futures
import gym


def do(it):
    random.seed(it)
    np.random.seed(it)
    env.seed(it)
    env.action_space.seed(it)
    env.reset()
    observations = []
    for i in range(3):
        while True:
            action = env.action_space.sample()
            ob, reward, done, _ = env.step(action)
            observations.append(ob)
            if done:
                break
    return observations


env = gym.make("BipedalWalker-v3")
if __name__ == "__main__":
    maxit = 20
    results1 = futures.map(do, range(2, maxit))
    results2 = futures.map(do, range(2, maxit))
    for a,b in zip(results1, results2):
        if np.array_equiv(a, b):
            print("equal, yay")
        else:
            print("not equal :(")

预期输出：equal, yay在每一行

实际输出：not equal :(多行

完整输出：

/home/chef/.venv/neuro/bin/python -m scoop /home/chef/dev/projekte/NeuroEvolution-CTRNN_new/random_test.py
[2020-05-18 18:05:03,578] launcher  INFO    SCOOP 0.7 1.1 on linux using Python 3.8.2 (default, Apr 27 2020, 15:53:34) [GCC 9.3.0], API: 1013
[2020-05-18 18:05:03,578] launcher  INFO    Deploying 4 worker(s) over 1 host(s).
[2020-05-18 18:05:03,578] launcher  INFO    Worker distribution: 
[2020-05-18 18:05:03,578] launcher  INFO       127.0.0.1:   3 + origin
/home/chef/.venv/neuro/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/chef/.venv/neuro/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/chef/.venv/neuro/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/chef/.venv/neuro/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
equal, yay
not equal :(
not equal :(
not equal :(
not equal :(
not equal :(
equal, yay
not equal :(
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
not equal :(
equal, yay
equal, yay
equal, yay
not equal :(
[2020-05-18 18:05:08,554] launcher  (127.0.0.1:37729) INFO    Root process is done.
[2020-05-18 18:05:08,554] launcher  (127.0.0.1:37729) INFO    Finished cleaning spawned subprocesses.

Process finished with exit code 0

当我在没有独家新闻的情况下运行这个示例时，我得到了几乎完美的结果：

/home/chef/.venv/neuro/bin/python /home/chef/dev/projekte/NeuroEvolution-CTRNN_new/random_test.py
/home/chef/.venv/neuro/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/chef/.venv/neuro/lib/python3.8/site-packages/scoop/fallbacks.py:38: RuntimeWarning: SCOOP was not started properly.
Be sure to start your program with the '-m scoop' parameter. You can find further information in the documentation.
Your map call has been replaced by the builtin serial Python map().
  warnings.warn(
not equal :(
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay
equal, yay

Process finished with exit code 0

score 2 · Accepted Answer

我可以通过将健身房的创建移到 do-function 中来“解决”它。

完整的更正代码如下所示：

import random
import numpy as np
from scoop import futures
import gym


def do(it):
    env = gym.make("BipedalWalker-v3")
    random.seed(it)
    np.random.seed(it)
    env.seed(it)
    env.action_space.seed(it)
    env.reset()
    observations = []
    for i in range(3):
        while True:
            action = env.action_space.sample()
            ob, reward, done, _ = env.step(action)
            observations.append(ob)
            if done:
                break
    return observations


if __name__ == "__main__":
    maxit = 20
    results1 = futures.map(do, range(2, maxit))
    results2 = futures.map(do, range(2, maxit))
    for a,b in zip(results1, results2):
        if np.array_equiv(a, b):
            print("equal, yay")
        else:
            print("not equal :(")

random - 如何使用 OpenAi-Gym 和 Scoop 产生可重现的随机性？

1 回答 1

Related

Reference