python - stable_baselines3 中 DQN 中的学习率调度程序

Question

我正在尝试使用gym 和 stable-baselines3 进行强化学习，特别是使用 MountainCar 的 stable-baselines3 的 DQN 实现（https://gym.openai.com/envs/MountainCar-v0/）。

我正在尝试实现一个学习率调度程序，只要强化学习模型的奖励值在给定次数的迭代中高于某个阈值，就会降低学习率。我尝试了以下方法：

在定义模型时将函数而不是数字传递给 learning_rate，因为 learning_rate 可以是可调用的。但是，它似乎只在第一次迭代中运行它，以后不会更新学习率。
在 policy_kwargs 中将函数作为 lr_scheduler 传递：

    env = gym.make('MountainCar-v0')
    #You can also load other environments like cartpole, MountainCar, Acrobot. Refer to https://gym.openai.com/docs/ for descriptions.
    #For example, if you would like to load Cartpole, just replace the above statement with "env = gym.make('CartPole-v1')".
    
    env = stable_baselines3.common.monitor.Monitor(env, log_dir )
    
    callback = EvalCallback(env,log_path = log_dir, deterministic=True) #For evaluating the performance of the agent periodically and logging the results.
    policy_kwargs = dict(activation_fn=torch.nn.ReLU,
                         net_arch=nn_layers, lr_schedule = lr_schedule_custom)
    
    model = DQN("MlpPolicy", env, policy_kwargs = policy_kwargs)

__init__() got multiple values for argument 'lr_schedule'但是，尽管文档（https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html）在政策的 lr_schedule 参数之间没有任何区别，但我得到了错误以及我在 policy_kwards 中使用的其他参数。我该怎么做？

非常感谢！

python - stable_baselines3 中 DQN 中的学习率调度程序

0 回答 0

Related

Reference