我正在尝试使用类似于 OpenAI Gym 中的 HandReach-v0 的环境。但是,当我从稳定的基线 3 运行 PPO 算法时,我收到以下错误:
当我调用 model.learn(total_timesteps = 25000) 时,错误线程开始
File "/home/yb1025/.conda/envs/allegro_gym/lib/python3.6/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 158, in collect_rollouts
obs_tensor = th.as_tensor(self._last_obs).to(self.device)
RuntimeError: Could not infer dtype of collections.OrderedDict
当我运行时:
print(env.observation_space.sample())
我得到:
OrderedDict([('achieved_goal', array([ 0.4008276 , -0.0685866 , -0.22774519, 0.05827878, 0.47759697,
0.7327185 , 2.4765387 , -0.8607227 , 0.89627784, -0.3062557 ,
-0.60894597, -1.4110374 ], dtype=float32)), ('desired_goal', array([-1.005679 , 0.34147817, 0.9540531 , 1.1987132 , 0.37403303,
0.32209057, 0.31095287, -2.1119647 , 0.82215786, -0.6675792 ,
-1.5640837 , 0.7348459 ], dtype=float32)), ('observation', array([-0.39490733, -0.67843455, -0.43765455, 0.1409685 , -0.67161006,
1.3106273 , 0.04009145, -1.714885 , -1.7085567 , -0.44895488,
-0.6111999 , -1.9730839 , 0.93647414, 0.2714189 , -0.67204314,
0.8948596 , -0.14034131, 1.0312599 , -1.2369561 , -0.2345652 ,
-0.17095046, 0.36576194, 0.9939435 , -1.0381949 , -1.2953175 ,
1.4120669 , -0.23294891, 0.30627772, -1.2250876 , -0.35871807,
1.3074456 , -1.060916 , -2.451866 , 0.18679707, 0.609564 ,
-0.16821782, -0.8448521 , -1.0025802 , 0.6878543 , -2.1562986 ,
0.6426088 , 1.386251 , 1.0454125 , -2.2426984 ], dtype=float32))])