简而言之:我在 Tensorflow/Keras 2+ 中有一个自定义损失层,它实现了一个涉及两个变量的损失函数,它也经过最小化。它有效,如下所示。我希望跟踪这两个变量的损失梯度。从输出来看,使用GradientTape.gradient()
似乎有效。tf.print()
但我不知道如何保持实际值。
详细:
假设这是我的自定义损失层(是的,损失函数很愚蠢,为了重现性,一切都被过度简化了):
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Layer
from tensorflow.keras.callbacks import EarlyStopping, Callback
import tensorflow.keras.backend as K
from tensorflow.keras import Model
class MyLoss(Layer):
def __init__(self, var1, var2):
super(MyLoss, self).__init__()
self.var1 = K.variable(var1) # or tf.Variable(var1) etc.
self.var2 = K.variable(var2)
def get_vars(self):
return self.var1, self.var2
def get_gradients(self):
return self.grads
def custom_loss(self, y_true, y_pred):
loss = self.var1 * K.mean(K.square(y_true-y_pred)) + self.var2 ** 2
return loss
def compute_gradients(self, y_true, y_pred):
with tf.GradientTape() as g:
loss = self.custom_loss(y_true, y_pred)
return loss, g.gradient(loss, [self.var1, self.var2])
def call(self, y_true, y_pred):
loss, grads = self.compute_gradients(y_true, y_pred)
self.grads = grads
# tf.print(grads)
self.add_loss(loss)
return y_pred
假设这些是我的数据并且Model
(是y
的,作为附加输入输入模型,这有效且不相关):
n_col = 10
n_row = 1000
X = np.random.normal(size=(n_row, n_col))
beta = np.arange(10)
y = X @ beta
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
inputs = Input(shape=(X_train.shape[1],))
y_input = Input(shape=(1,))
hidden1 = Dense(10)(inputs)
output = Dense(1)(hidden1)
my_loss = MyLoss(0.5, 0.5)(y_input, output) # here can also initialize those var1, var2
model = Model(inputs=[inputs, y_input], outputs=my_loss)
model.compile(optimizer= 'adam')
现在模型和损失工作了,正如变量配置文件所证明的那样,例如通过在每个时期之后保留变量(如果您检查愚蠢的损失,它们的值也有意义):
var1_list = []
var2_list = []
for i in range(100):
if i % 10 == 0:
print('step %d' % i)
model.fit([X_train, y_train], None,
batch_size=32, epochs=1, validation_split=0.1, verbose=0)
var1, var2 = model.layers[-1].get_vars()
var1_list.append(var1.numpy())
var2_list.append(var2.numpy())
plt.plot(var1_list, label='var1')
plt.plot(var2_list, 'r', label='var2')
plt.legend()
plt.show()
但是当我希望观察/保持梯度时,我会得到一个(空的?)张量列表:
grads = model.layers[-1].get_gradients()
grads
ListWrapper([<tf.Tensor 'gradient_tape/model/my_loss/mul/Mul:0' shape=() dtype=float32>, <tf.Tensor 'gradient_tape/model/my_loss/pow/mul_1:0' shape=() dtype=float32>])
当然,调用numpy()
这些毫无意义:
grads[0].numpy()
AttributeError: 'Tensor' object has no attribute 'numpy'
然而。显然这里有些东西,因为当我tf.print(grads)
在训练时使用打印梯度(取消注释上面tf.print(grads)
的函数内部call()
)时,梯度值被打印并且它们也有意义:
[226.651245, 1] [293.38916, 0.998] [263.979889, 0.996000171] [240.448029, 0.994000435] [337.309021, 0.992001] [286.644775, 0.990001857] [194.823975, 0.988003075] [173.756546, 0.98600477] [267.330505, 0.984007] [139.302826, 0.982009768] [310.315216, 0.980013192] [263.746216, 0.97801733] [267.713, 0.976022303] [291.754578, 0.974028111] [376.523895, 0.972034812] [474.974884, 0.970042467] [375.520294, 0.968051136] etc. etc.
请注意,无需添加g.watch([self.var1, self.var2])
,尽管添加它不会改变问题。
我如何跟踪这些渐变(比如我跟踪var1
和var2
)?什么tf.print()
是我看不到的“看到”?