1

如何使用自定义环境将奖励添加到稳定基线 3 中的张量板日志记录?

我有这个学习代码

model = PPO(
    "MlpPolicy", env,
    learning_rate=1e-4,
    policy_kwargs=policy_kwargs,
    verbose=1,
    tensorboard_log="./tensorboard/")
4

1 回答 1

1

根据他们的文档,您可以通过创建自己的回调来记录任意值:

import numpy as np

from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import BaseCallback

model = SAC("MlpPolicy", "Pendulum-v0", tensorboard_log="/tmp/sac/", verbose=1)


class TensorboardCallback(BaseCallback):
    """
    Custom callback for plotting additional values in tensorboard.
    """

    def __init__(self, verbose=0):
        super(TensorboardCallback, self).__init__(verbose)

    def _on_step(self) -> bool:
        # Log scalar value (here a random variable)
        value = np.random.random()
        self.logger.record('random_value', value)
        return True


model.learn(50000, callback=TensorboardCallback())
于 2021-09-15T12:39:31.793 回答