我最近在 numpy 中实现了一个自动编码器。我已经用数值检查了所有的梯度,它们看起来是正确的,如果学习率足够小,成本函数似乎在每次迭代时也会降低。
问题:
如您所知,自动编码器获取输入x
,并尝试返回尽可能接近的内容x
。
每当 myx
是行向量时,它都能很好地工作。代价函数减小到0,我们得到了很好的结果,例如:当x = [[ 0.95023264 1. ]]
我10000次迭代后得到的输出是xhat = [[ 0.94972973 0.99932479]]
,代价函数大约是10^-7
但是,当 myx
不是行向量时,即使它是一个小的 2 x 2 矩阵,输出也不会接近原始 x,并且成本函数不会减少到 0,而是会趋于平稳。
例子:
当输入为x = [[ 0.37853141 1. ][ 0.59747807 1. ]]
输出时xhat = [[ 0.48882265 0.9985147 ][ 0.48921648 0.99927143]]
。您可以看到 xhat 的第一列似乎并不接近 x 的第一列,而是接近 x 的第一列的平均值。这似乎发生在我运行的所有测试中。此外,成本函数稳定在 0.006 左右,不会达到 0。
为什么会发生这种情况,我该如何解决?再次 - 导数是正确的。我不知道如何解决这个问题。
我的代码
import numpy as np
import matplotlib.pyplot as plt
def g(x): #sigmoid activation functions
return 1/(1+np.exp(-x)) #same shape as x!
def gGradient(x): #gradient of sigmoid
rows,cols = x.shape
grad = np.zeros((cols, cols))
for i in range(0, cols):
grad[i, i] = g(x[0, i])*(1-g(x[0, i]))
return grad
def cost(x, xhat): #mean squared error between x the data and xhat the output of the machine
return ((x - xhat)**2).sum()/(2 * m)
m, n = 2, 1
trXNoBias = np.random.rand(m, n)
trX = np.ones((m, n+1))
trX[:, :n] = trXNoBias #add the bias, column of ones
n = n+1
k = 1 #num of neurons in the hidden layer of the autoencoder, shouldn't matter too much
numIter = 10000
learnRate = 0.001
x = trX
w1 = np.random.rand(n, k) #weights from input layer to hidden layer, shape (n, k)
w2 = np.random.rand(k, n) #weights from hidden layer to output layer of the autoencoder, shape (k, n)
w3 = np.random.rand(n, n) #weights from output layer of autoencoder to entire output of the machine, shape (n, n)
costArray = np.zeros((numIter, ))
for i in range(0, numIter):
#Feed-Forward
z1 = np.dot(x,w1) #output of the input layer, shape (m, k)
h1 = g(z1) #input of hidden layer, shape (m, k)
z2 = np.dot(h1, w2) #output of the hidden layer, shape (m, n)
h2 = g(z2) #Output of the entire autoencoder. The output layer of the autoencoder. shape (m, n)
xhat = np.dot(h2, w3) #the output of the machine, which hopefully resembles the original data x, shape (m, n)
print(cost(x, xhat))
costArray[i] = cost(x, xhat)
#Backprop
dSdxhat = (1/float(m)) * (xhat-x)
dSdw3 = np.dot(h2.T, dSdxhat)
dSdh2 = np.dot(dSdxhat, w3.T)
dSdz2 = np.dot(dSdh2, gGradient(z2))
dSdw2 = np.dot(h1.T,dSdz2)
dSdh1 = np.dot(dSdz2, w2.T)
dSdz1 = np.dot(dSdh1, gGradient(z1))
dSdw1 = np.dot(x.T,dSdz1)
w3 = w3 - learnRate * dSdw3
w2 = w2 - learnRate * dSdw2
w1 = w1 - learnRate * dSdw1
plt.plot(costArray)
plt.show()
print(x)
print(xhat)