reinforcement-learning - Ray[RLlib] 自定义动作分布（TorchDeterministic）

Question

我们知道在一个Box（连续动作）Action Space的情况下，对应的Action Distribution是DiagGaussian（概率分布）。

但是，我想使用 TorchDeterministic（直接返回输入值的动作分布）。

这是代码，取自https://github.com/ray-project/ray/blob/a91ddbdeb98e81741beeeb5c17902cab1e771105/rllib/models/torch/torch_action_dist.py#L372：

class TorchDeterministic(TorchDistributionWrapper):
    """Action distribution that returns the input values directly.
    This is similar to DiagGaussian with standard deviation zero (thus only
    requiring the "mean" values as NN output).
    """

    @override(ActionDistribution)
    def deterministic_sample(self) -> TensorType:
        return self.inputs

    @override(TorchDistributionWrapper)
    def sampled_action_logp(self) -> TensorType:
        return torch.zeros((self.inputs.size()[0], ), dtype=torch.float32)

    @override(TorchDistributionWrapper)
    def sample(self) -> TensorType:
        return self.deterministic_sample()

    @staticmethod
    @override(ActionDistribution)
    def required_model_output_shape(
            action_space: gym.Space,
            model_config: ModelConfigDict) -> Union[int, np.ndarray]:
        return np.prod(action_space.shape)

通过正确的导入，我将此类的内容复制并粘贴到名为 custom_action_dist.py 的文件中。

我用以下方式导入了它：

from custom_action_dist import TorchDeterministic

将我的 custom_action_dist 注册为：

ModelCatalog.register_custom_action_dist("my_custom_action_dist", TorchDeterministic)

在我指定的配置中：

"custom_action_dist": "my_custom_action_dist"

但是，我收到以下错误：

"File "/home/user/DRL/lib/python3.8/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 38, in logp
    return self.dist.log_prob(actions)
AttributeError: 'TorchDeterministic' object has no attribute 'dist'"

看来我必须指定一个概率分布。

谁能帮我，告诉我是哪一个？

谢谢并期待您的回复！

reinforcement-learning - Ray[RLlib] 自定义动作分布（TorchDeterministic）

0 回答 0

Related

Reference