我试图通过有限差分法估计函数的梯度: finite difference method for estimating gradient
TLDR:
grad f(x) = [f(x+h)-f(x-h)]/(2h)
对于足够小的 h。
如您所知,这也用于梯度检查阶段以检查您在 AI 中的反向传播。
这是我的网络:
def defineModel():
global model
model = Sequential()
model.add(keras.Input(shape=input_shape))
model.add(layers.Conv2D(32, kernel_size=(3, 3), activation="relu"))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(64, kernel_size=(3, 3), activation="relu"))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add( layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(num_classes, activation="softmax"))
model.build()
model.summary()
这部分很好,没有错误。我刚刚在这里提到了它,所以你对我的模型有一定的了解。我在 MNIST 工作,所以一切都很简单。通过 1 个 epoch 和几行 TF 代码,我达到了 +98% 的准确度,这对于临时模型来说非常好。
由于我正在进行对抗性训练,我想要我的损失相对于输入数据的梯度:
顺便说一句,我使用了平铺的想法:
如果您使用 (tile*tile) 尺寸的方形图块覆盖输入图像,并且没有重叠,您可以假设图块内图像的梯度几乎是恒定的,因此它是一个合理的近似值。但作为调试问题,在我的代码中tile=1
,我们正在计算像素梯度。
这就是问题所在,但我不知道在哪里!我控制了损失,并且loss(x+h)
几乎处于近距离,所以我知道这很好。我的 TF 自动反向传播也可以正常工作,我已经对其进行了测试。问题必须与计算手动梯度的方式有关。loss(x-h)
loss(x)
tile=1
h=1e-4 #also tried 1e-5, 1e-6 but did not work
#A dummy function to wait
def w():
print()
ww=input('wait')
print()
#This function works fine.
def rel_error(x, y):
""" returns relative error """
return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))
#the problem is here. given an index suah is idx, I'm going to manipulate
#x_test[idx] and compute loss gradient with respect to this input
def estimateGrad(idx):
y=model(np.expand_dims(x_test[idx],axis=0))
y=np.expand_dims(y_train[idx],axis=0)
x=np.squeeze(x_test[idx])
cce = tf.keras.losses.CategoricalCrossentropy()
grad=np.zeros((28,28))
num=int(28/tile)
#MNIST pictures are 28*28 pixels. Now grad is 28*28 nd.array
#and x is input image converted to 28*28 nd.array
for i in range(num):
for j in range(num):
plus=copy.deepcopy(x)
minus=copy.deepcopy(x)
#Now plus is x+h and minus is x-h
plus[i*tile:(i+1)*tile , j*tile:(j+1)*tile]= plus[i*tile:(i+1)*tile , j*tile:(j+1)*tile]+h
minus[i*tile:(i+1)*tile , j*tile:(j+1)*tile]= minus[i*tile:(i+1)*tile , j*tile:(j+1)*tile]-h
plus=np.expand_dims(plus,axis=(0,-1))
minus=np.expand_dims(minus,axis=(0,-1))
#Now we pass plus and minus to model prediction in the next two lines
plus=model(plus)
minus=model(minus)
#Since I want to find gradient of loss with respect to x, in the next
#two lines I will set plus=loss(x+h) and minus=loss(x-h)
#In other words, here our finction f which we want to cumpute its grads
#is the loss function
plus=cce(y,plus).numpy()
minus=cce(y,minus).numpy()
#Applying the formola : grad=(loss(x+h)-loss(x-h))/2/h
grad[i*tile:(i+1)*tile , j*tile:(j+1)*tile]=(plus-minus)/2/h
#Ok now lets check our grad with TF autograd module
x= tf.convert_to_tensor(np.expand_dims(x_test[idx], axis=0))
with tf.GradientTape() as tape:
tape.watch(x)
y=model(x)
y_expanded=np.expand_dims(y_train[idx],axis=0)
loss=cce(y_expanded,y)
delta=tape.gradient(loss,x)
delta=delta.numpy()
delta=np.squeeze(delta)
#delta is gradients returned by TF via auto-differentiation.
#We calculate the error and unfortunately its large
diff=rel_error(grad,delta)
print('diff ',diff)
w()
#problem : diff is very large. it should be less than 1e-4
你可以在这里参考我的完整代码。