python - 多代理环境的迁移学习。使用 RLlib

翻译自：https://stackoverflow.com/questions/71122986 2022-02-15T08:10:44.713

18 次

我在使用 RlLib 时遇到问题：我训练了一个网络，它取得了很好的效果。恢复最后一个检查点时，一切正常。但是，如果初始化一个新的训练器（类似于受过训练的训练器并将其权重设置为等于受过训练的训练器），我不会得到好的结果。

preTrained_trainer = PPOTrainer(config=config_trained, env=config_trained["env"])
# Restore all policies from checkpoint.
preTrained_trainer.restore(config_checkpoint)
# Get trained weights for all policies.
trained_weights = preTrained_trainer.get_weights()

new_trainer = PPOTrainer(config=config_trained, env=config_trained["env"],)
# Set back all weights# trained weights.

new_trainer.set_weights({
    pid: w for pid, w in trained_weights.items()
})

PS：我想通过这样做来复制过滤器：

# copy the filters policy_frozen are all the policies trained
for policy_name in policy_frozen:
    new_trainer.workers.local_worker().filters[policy_name] = preTrained_trainer.workers.local_worker().filters[policy_name]

但是，我仍然有不好的结果。

我错过了什么？除了重量之外，我还应该设置其他东西以获得相同的教练吗？

python - 多代理环境的迁移学习。使用 RLlib

0 回答 0

Related

Reference