0

我遇到了 tape.gradients(loss, model.trainable_variables) 没有返回渐变的问题,我已经查看了其他问题并且已经解决了数周的问题。

cont = continuous_model()
def step_episode(env):
  obs = env.reset()
  log_probs = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
  rewards = []
  done = False
  i=0
  while not done :
    obs = tf.expand_dims( obs , 0 )
    actions   = cont(obs )
    normal = tfd.Normal(loc=actions[0][0][0] , scale=tf.math.abs(actions[0][0][1]))
    torque = normal.sample(1)
    log_prob = normal.log_prob(torque)
    log_probs = log_probs.write(i ,log_prob)
    torque = tf.keras.activations.tanh(torque)
    obs , reward , done , info = env.step(torque)
    rewards.append(reward)
    i += 1
  return  log_probs , rewards

for episode in range(5000):
  with tf.GradientTape() as tape:
    log_probs , rewards = step_episode( env )
  loss = actor_loss( log_probs , rewards )
  grads = tape.gradient(loss , cont.trainable_variables)
  optimizer.apply_gradients(zip(grads , cont.trainable_variables))

loss 返回一个标量张量。当我使用来自 tfd 的离散分布时,此代码有效,但是当我使用正态/连续分布时,不会返回梯度。

4

0 回答 0