1

Ray rllib使用库在 8 核 CPU 的 sagemaker 上运行sagemaker_rl,我将 num_workers 设置为 7。

经过长时间的处决后,我面临The actor died unexpectedly before finishing this task


class MyLauncher(SageMakerRayLauncher):
    def register_env_creator(self):
        register_env(
            "RiveRL-v1",
            lambda env_config: create_env(env_config),
        )

    def get_experiment_config(self):
        return {
            "training": {
                "env": "RiveRL-v1",
                "run": "PPO",
                "config": {
                    "ignore_worker_failures": True,
                    "gamma": 0.6,
                    "num_sgd_iter": 5,
                    "lr": 0.0001,
                    "sgd_minibatch_size": 32768,
                    "train_batch_size": 100000,
                    "use_gae": False,
                    "num_workers": (self.num_cpus - 1),
                    "num_gpus": self.num_gpus,
                    "batch_mode": "complete_episodes",
                    "env_config": {
                        "window_size": 25,
                        "max_allowed_loss": 0.2
                    },
                    "observation_filter": "MeanStdFilter",
                    "entropy_coeff": 0.01,
                },
                "checkpoint_freq": 2,
            }
        }

失败 #1(发生在 2021-10-20_18-35-15) Traceback(最近一次调用最后一次):
文件“/usr/local/lib/python3.6/dist-packages/ray/tune/trial_runner.py”,第 467 行,在 _process_trial 结果 = self.trial_executor.fetch_result(trial) 文件“/usr/local/lib/python3.6/dist-packages/ray/tune/ray_trial_executor.py”,第 431 行,在 fetch_result 结果 = ray. get(trial_future[0], DEFAULT_GET_TIMEOUT) File "/usr/local/lib/python3.6/dist-packages/ray/worker.py", line 1517, in get raise value ray.exceptions.RayActorError: 演员意外死亡在完成这项任务之前。

但是每当我改变问题num_worker1,问题就解决了。知道如何解决这个问题吗?

4

0 回答 0