如何使用自定义环境将奖励添加到稳定基线 3 中的张量板日志记录?
我有这个学习代码
model = PPO(
"MlpPolicy", env,
learning_rate=1e-4,
policy_kwargs=policy_kwargs,
verbose=1,
tensorboard_log="./tensorboard/")
如何使用自定义环境将奖励添加到稳定基线 3 中的张量板日志记录?
我有这个学习代码
model = PPO(
"MlpPolicy", env,
learning_rate=1e-4,
policy_kwargs=policy_kwargs,
verbose=1,
tensorboard_log="./tensorboard/")
根据他们的文档,您可以通过创建自己的回调来记录任意值:
import numpy as np
from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import BaseCallback
model = SAC("MlpPolicy", "Pendulum-v0", tensorboard_log="/tmp/sac/", verbose=1)
class TensorboardCallback(BaseCallback):
"""
Custom callback for plotting additional values in tensorboard.
"""
def __init__(self, verbose=0):
super(TensorboardCallback, self).__init__(verbose)
def _on_step(self) -> bool:
# Log scalar value (here a random variable)
value = np.random.random()
self.logger.record('random_value', value)
return True
model.learn(50000, callback=TensorboardCallback())