问题标签 [stable-baselines]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

86 问题

0 投票

0 回答

145 浏览

python - KeyError：在使用 OpenAI stable-baselines3 和健身房尝试多智能体强化学习时出现“观察”

我试图在这里使用饥饿的鹅健身房来训练 PPO：

但我的游戏只玩了一步。在调试 ON 运行后，我得到以下跟踪：

所以我在vscode中调试了更多。从下面的屏幕截图中可以看出，observation和desired_goal键都不存在于observation_dict.

这也是我调试上述调用的方式：

我是否错误地使用了 API 以导致这种情况发生（我是 API 新手）？（或者这可能是一个错误，我觉得这不太可能。）

Colab 笔记本和模型

0 投票

1 回答

609 浏览

python-3.x - 如何使用稳定的基线3让模型在循环中学习？

在来自 stable baselines3 网站（ https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html ）的示例代码中，模型首先会通过model.learn(total_timesteps=25000)line 学习，然后可以在播放循环中使用.

现在，由于我希望能够在代理学习过程中监控不同的参数（来自自定义环境），所以我的问题是：如何model.learn在播放循环中使用？

python-3.x reinforcement-learning stable-baselines

0 投票

1 回答

826 浏览

python - Understanding custom policies in stable-baselines3

I was trying to understand the policy networks in stable-baselines3 from this doc page.

As explained in this example, to specify custom CNN feature extractor, we extend BaseFeaturesExtractor class and specify it in policy_kwarg.features_extractor_class with first param CnnPolicy:

Q1. Can we follow same approach for custom MLP feature extractor?
As explained in this example, to specify custom MLP feature extractor, we extend ActorCriticPolicy class and override _build_mlp_extractor() and pass it as first param:

Q2. Can we follow same approach for custom CNN feature extractor?
I feel either we can have CNN extractor or MLP extractor. So it makes no sense to pass MlpPolicy as first param to model and then specify CNN feature extractor in policy_kwarg.features_extractor_class as in this example. This result in following policy (containing both features_extractor and mlp_extractor), which I feel is incorrect:

Q3. Am I correct with this understanding? If yes, then is one of the MLP or CNN feature extractor ignored?