我正在使用 ray RLlib 库在 5-in-a-row 游戏中训练多智能体 Trainer。这是零和环境,所以我有代理行为退化的问题(总是赢得第一个代理,5 步获胜)。我有一个想法以这种方式改变代理的学习率:首先训练第一个代理,第二个随机训练,学习率为零。在第一个代理学会如何赢得超过 90% 的游戏后切换。然后重复但是在构造函数中初始化后我无法更改学习率。这可能吗?
def gen_policy(GENV, lr=0.001):
config = {
"model": {
"custom_model": 'GomokuModel',
"custom_options": {"use_symmetry": True, "reg_loss": 0},
},
"custom_action_dist": Categorical,
"lr": lr
}
return (None, GENV.observation_space, GENV.action_space, config)
def map_fn(agent_id):
if agent_id=='agent_0':
return "policy_0"
else:
return "policy_1"
trainer = ray.rllib.agents.a3c.A3CTrainer(env="GomokuEnv", config={
"multiagent": {
"policies": {"policy_0": gen_policy(GENV, lr = 0.001), "policy_1": gen_policy(GENV,lr=0)},
"policy_mapping_fn": map_fn,
},
"callbacks":
{"on_episode_end": clb_episode_end},
while True:
rest = trainer.train()
#here I want to change learning rate of my policies based on environment statistics
我试图在 while True 循环中添加这些行
new_config = trainer.get_config()
new_config["multiagent"]["policies"]["policy_0"]=gm.gen_policy(GENV, lr = 0.00321)
new_config["multiagent"]["policies"]["policy_1"]=gm.gen_policy(GENV, lr = 0.00175)
trainer["raw_user_config"]=new_config
trainer.config = new_config
它没有帮助