我试图在这里使用饥饿的鹅健身房来训练 PPO:
from kaggle_environments import make
from stable_baselines3 import PPO
directions = {0:'EAST', 1:'NORTH', 2:'WEST', 3:'SOUTH'}
loaded_model = PPO.load('logs\\dqn2ppo_nonvec\\model')
def agent_ppo(obs, config):
a = directions[loaded_model.predict(obs)[0]]
return a
env = make('hungry_geese',debug=True)
env.run([agent_ppo,'agent_bfs.py'])
env.render(mode="ipython")
但我的游戏只玩了一步。在调试 ON 运行后,我得到以下跟踪:
Traceback (most recent call last):
File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\kaggle_environments\agent.py", line 151, in act
action = self.agent(*args)
File "<ipython-input-29-faad97d317d6>", line 5, in agent_ppo
a = directions[loaded_model.predict(obs)[0]]
File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\base_class.py", line 497, in predict
return self.policy.predict(observation, state, mask, deterministic)
File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\policies.py", line 262, in predict
observation = ObsDictWrapper.convert_dict(observation)
File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\vec_env\obs_dict_wrapper.py", line 68, in convert_dict
return np.concatenate([observatio I was trying to use hungry-geese gym [here](https://www.kaggle.com/victordelafuente/dqn-goose-with-stable-baselines3-pytorch#) to train PPO. But my game was getting played for only one step. After running with debug ON I got following trace:
Traceback (most recent call last):
File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\kaggle_environments\agent.py", line 151, in act
action = self.agent(*args)
File "<ipython-input-29-faad97d317d6>", line 5, in agent_ppo
a = directions[loaded_model.predict(obs)[0]]
File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\base_class.py", line 497, in predict
return self.policy.predict(observation, state, mask, deterministic)
File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\policies.py", line 262, in predict
observation = ObsDictWrapper.convert_dict(observation)
File "c:\users\crrma\.virtualenvs\hungry_geese-ept5y6nv\lib\site-packages\stable_baselines3\common\vec_env\obs_dict_wrapper.py", line 68, in convert_dict
return np.concatenate([observation_dict[observation_key], observation_dict[goal_key]], axis=-1)
KeyError: 'observation'
所以我在vscode中调试了更多。从下面的屏幕截图中可以看出,observation
和desired_goal
键都不存在于observation_dict
.
这也是我调试上述调用的方式:
我是否错误地使用了 API 以导致这种情况发生(我是 API 新手)?(或者这可能是一个错误,我觉得这不太可能。)