c# - ml-agents 包中 Unity3D 的奖励逻辑

Question

Unity3D 有一个名为 ML-agents 的强化学习包，我正在使用它来理解它的组件。对于我的项目，我需要编写自己的逻辑来设置 Unity3D 的奖励（不是使用 C# 逻辑的“addReward”，而是编写 Python 代码来设置 Unity 的奖励）。

我想知道我是否可以使用 ML-agents 包提供的 Python API 来使用 env 观察并使用 Unity 设置的自定义逻辑更新奖励（并发送回 Unity）？在哪里寻找这样做？

换句话说（示例）。在 3Dball 示例中，在 Unity3D 中设置了奖励逻辑，因此如果球停留在平台上获得正奖励，如果球从平台上落下，则获得负奖励。此逻辑在 Unity3D 中使用 C# 实现，并确定球的位置（矢量位置）与平台进行比较。对于每个动作，代理调用 env.step(action) 并获取 (reward, state...) 的元组。如果我想在 Unity 之外编写逻辑怎么办？例如，如果我想编写一个 Python 程序来读取观察结果（来自 Unity3D）并在不使用 Unity 奖励逻辑的情况下更新奖励？这可能吗？我不明白这个选项在 ML-agents 的 Python API 中的位置。

目前，我正在考虑在 Unity3D 中用 C# 设置奖励的那一行之间运行一个外部 python 程序，但我想知道这是否过于复杂，是否有更简单的解决方案。

任何帮助将非常感激。

问候圭多

score 0 · Accepted Answer

According to my Reinforcement Learning understanding, the reward is handled by the environment and the agent just get it together with the next observation. You could say it's part of the observation.

Therefore the logic which rewards to get when is part of the environment logic, i.e. in case of Unity-ML the environment lives in Unity, so you have to implement the reward function in Unity (C#).

So in order to keep the clear separation between environment (Unity) and agent (Python). I think its best to keep the reward logic in Unity/C# and don't tinker with it in Python.

tl;dr: I think it's intended that you cannot set the reward via the Python API to keep a clear environment-agent separation.

c# - ml-agents 包中 Unity3D 的奖励逻辑

1 回答 1

Related

Reference