我的训练循环给了我以下警告:
警告:张量流:当最小化损失时,变量 ['noise:0'] 不存在梯度。
经过一番修补后,我确定这只发生在噪声变量作为参数传递给我的损失函数(即 tf.function)时。下面的代码表明,当损失函数不是 tf.function 或者函数中引用了全局噪声变量时,没有问题。它还表明,当噪声变量用作 tf.function 中的参数时,尝试对噪声变量执行梯度会导致错误:
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow_probability import distributions as tfd
from tensorflow_probability import bijectors as tfb
constrain_positive = tfb.Shift(np.finfo(np.float64).tiny)(tfb.Exp())
noise = tfp.util.TransformedVariable(initial_value=.1, bijector=constrain_positive, dtype=np.float64, name="noise")
trainable_variables = [noise.variables[0]]
kernel = tfp.math.psd_kernels.ExponentiatedQuadratic()
optimizer = tf.keras.optimizers.Adam()
index_points = tf.constant([[0]], dtype=np.float64)
observations = tf.constant([0], dtype=np.float64)
# I can train noise when it is passed as an argument to a python function
def loss_function_1(index_points, observations, kernel, observation_noise_variance):
gp = tfd.GaussianProcess(kernel, index_points, observation_noise_variance=observation_noise_variance)
return -gp.log_prob(observations)
with tf.GradientTape() as tape:
nll_1 = loss_function_1(index_points, observations, kernel, noise)
grad_1 = tape.gradient(nll_1, trainable_variables)
print(grad_1)
optimizer.apply_gradients(zip(grad_1, trainable_variables))
# I can train noise if it is used in a tf.function and not passed as an argument
@tf.function(autograph=False, experimental_compile=False)
def loss_function_2(index_points, observations, kernel):
gp = tfd.GaussianProcess(kernel, index_points, observation_noise_variance=noise)
return -gp.log_prob(observations)
with tf.GradientTape() as tape:
nll_2 = loss_function_2(index_points, observations, kernel)
grad_2 = tape.gradient(nll_2, trainable_variables)
print(grad_2)
optimizer.apply_gradients(zip(grad_2, trainable_variables))
# I can train noise if it is passed as an argument to a tf.function if the tf.function
# uses the global variable
@tf.function(autograph=False, experimental_compile=False)
def loss_function_3(index_points, observations, kernel, observation_noise_variance):
gp = tfd.GaussianProcess(kernel, index_points, observation_noise_variance=noise)
return -gp.log_prob(observations)
with tf.GradientTape() as tape:
nll_3 = loss_function_3(index_points, observations, kernel, noise)
grad_3 = tape.gradient(nll_3, trainable_variables)
print(grad_3)
optimizer.apply_gradients(zip(grad_3, trainable_variables))
# I cannot train noise if it is passed as an argument to a tf.function if the tf.function
# the local variable
@tf.function(autograph=False, experimental_compile=False)
def loss_function_4(index_points, observations, kernel, observation_noise_variance):
gp = tfd.GaussianProcess(kernel, index_points, observation_noise_variance=observation_noise_variance)
return -gp.log_prob(observations)
with tf.GradientTape() as tape:
nll_4 = loss_function_4(index_points, observations, kernel, noise)
grad_4 = tape.gradient(nll_4, trainable_variables)
print(grad_4)
optimizer.apply_gradients(zip(grad_4, trainable_variables))
此代码打印:
[<tf.Tensor: shape=(), dtype=float64, numpy=0.045454545454545456>]
[<tf.Tensor: shape=(), dtype=float64, numpy=0.045413242911911206>]
[<tf.Tensor: shape=() , dtype=float64, numpy=0.04537197429557289>]
[无]
然后它返回错误消息:
ValueError:没有为任何变量提供渐变:['noise:0']。
理想情况下,我会得到 tf.function 的性能提升,所以我不想使用 loss_function_1。另外,我希望能够将不同的噪声变量传递给我的损失函数,所以我不想像在 loss_function_2 或 loss_function_3 中那样使用全局变量。
当我尝试对作为参数传递给 tf.function 的变量执行渐变时,为什么会得到 None?我怎样才能解决这个问题?