我遇到了 tape.gradients(loss, model.trainable_variables) 没有返回渐变的问题,我已经查看了其他问题并且已经解决了数周的问题。
cont = continuous_model()
def step_episode(env):
obs = env.reset()
log_probs = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
rewards = []
done = False
i=0
while not done :
obs = tf.expand_dims( obs , 0 )
actions = cont(obs )
normal = tfd.Normal(loc=actions[0][0][0] , scale=tf.math.abs(actions[0][0][1]))
torque = normal.sample(1)
log_prob = normal.log_prob(torque)
log_probs = log_probs.write(i ,log_prob)
torque = tf.keras.activations.tanh(torque)
obs , reward , done , info = env.step(torque)
rewards.append(reward)
i += 1
return log_probs , rewards
for episode in range(5000):
with tf.GradientTape() as tape:
log_probs , rewards = step_episode( env )
loss = actor_loss( log_probs , rewards )
grads = tape.gradient(loss , cont.trainable_variables)
optimizer.apply_gradients(zip(grads , cont.trainable_variables))
loss 返回一个标量张量。当我使用来自 tfd 的离散分布时,此代码有效,但是当我使用正态/连续分布时,不会返回梯度。