我应用了 RLlib 的 SAC 和下面的超参数链接来解决 CartPole,这是一个非常简单的环境。但是,性能(episode_reward_mean)没有提高,保持在 40 左右。有人可以帮忙吗?
https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/sac/cartpole-sac.yaml
import ray
#import ray.rllib.agents.ppo as ppo
import ray.rllib.agents.sac as sac
ray.init()
config = sac.DEFAULT_CONFIG.copy()
config[“num_gpus”] = 0
#config[“framework”] = “torch”#
config[“framework”] = “tf”
config[“no_done_at_end”] = “false”
config[“gamma”] = 0.95
config[“target_network_update_freq”] = 32
config[“tau”] = 1.0
config[“train_batch_size”] = 32
config[‘optimization’][‘actor_learning_rate’] = 0.005
config[‘optimization’][‘critic_learning_rate’] = 0.005
config[‘optimization’][‘entropy_learning_rate’] = 0.0001
#trainer = sac.SACTrainer(config=config, env=“MountainCar-v0”)
trainer = sac.SACTrainer(config=config, env=“CartPole-v0”)
for i in range(5000):
result = trainer.train()
if i % 10 == 0:
#checkpoint = trainer.save()
print("i: “, i,” reward: ",result[‘episode_reward_mean’])