0

我正在尝试使用类似于 OpenAI Gym 中的 HandReach-v0 的环境。但是,当我从稳定的基线 3 运行 PPO 算法时,我收到以下错误:

当我调用 model.learn(total_timesteps = 25000) 时,错误线程开始

File "/home/yb1025/.conda/envs/allegro_gym/lib/python3.6/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 158, in collect_rollouts
    obs_tensor = th.as_tensor(self._last_obs).to(self.device)
RuntimeError: Could not infer dtype of collections.OrderedDict

当我运行时:

print(env.observation_space.sample())

我得到:

OrderedDict([('achieved_goal', array([ 0.4008276 , -0.0685866 , -0.22774519,  0.05827878,  0.47759697,
        0.7327185 ,  2.4765387 , -0.8607227 ,  0.89627784, -0.3062557 ,
       -0.60894597, -1.4110374 ], dtype=float32)), ('desired_goal', array([-1.005679  ,  0.34147817,  0.9540531 ,  1.1987132 ,  0.37403303,
        0.32209057,  0.31095287, -2.1119647 ,  0.82215786, -0.6675792 ,
       -1.5640837 ,  0.7348459 ], dtype=float32)), ('observation', array([-0.39490733, -0.67843455, -0.43765455,  0.1409685 , -0.67161006,
        1.3106273 ,  0.04009145, -1.714885  , -1.7085567 , -0.44895488,
       -0.6111999 , -1.9730839 ,  0.93647414,  0.2714189 , -0.67204314,
        0.8948596 , -0.14034131,  1.0312599 , -1.2369561 , -0.2345652 ,
       -0.17095046,  0.36576194,  0.9939435 , -1.0381949 , -1.2953175 ,
        1.4120669 , -0.23294891,  0.30627772, -1.2250876 , -0.35871807,
        1.3074456 , -1.060916  , -2.451866  ,  0.18679707,  0.609564  ,
       -0.16821782, -0.8448521 , -1.0025802 ,  0.6878543 , -2.1562986 ,
        0.6426088 ,  1.386251  ,  1.0454125 , -2.2426984 ], dtype=float32))])

4

0 回答 0