在来自 stable baselines3 网站( https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html )的示例代码中,模型首先会通过model.learn(total_timesteps=25000)
line 学习,然后可以在播放循环中使用.
现在,由于我希望能够在代理学习过程中监控不同的参数(来自自定义环境),所以我的问题是:如何model.learn
在播放循环中使用?
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
# Parallel environments
env = make_vec_env("CartPole-v1", n_envs=4)
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=25000)
model.save("ppo_cartpole")
del model # remove to demonstrate saving and loading
model = PPO.load("ppo_cartpole")
obs = env.reset()
while True:
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()