tensorflow - Tensorflow NN 实现不收敛

Question

我正在尝试仅使用 Tensorflow 实现一个简单的前馈神经网络，并且它没有收敛。我不确定问题出在网络架构还是训练过程实施中。使用 Keras 构建的简单 2 层 NN 似乎融合得很好：

from keras.layers import LSTM, Dense, Flatten, Conv1D
from keras import Sequential
model = Sequential()
model.add(Dense(32, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(21, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(np.array(train_in), np.array(train_target), epochs=10, validation_split=0.1, batch_size=16)
    Epoch 2/10 59717/59717 [==============================] - 4s 71us/sample - loss: 1.4021 - accuracy: 0.6812 - val_loss: 1.1049 - val_accuracy: 0.7066
Epoch 3/10
59717/59717 [==============================] - 4s 70us/sample - loss: 1.0942 - accuracy: 0.7321 - val_loss: 1.2269 - val_accuracy: 0.7015
Epoch 4/10
59717/59717 [==============================] - 4s 70us/sample - loss: 0.9096 - accuracy: 0.7654 - val_loss: 0.8207 - val_accuracy: 0.7905
Epoch 5/10
59717/59717 [==============================] - 4s 70us/sample - loss: 0.8373 - accuracy: 0.7790 - val_loss: 0.6863 - val_accuracy: 0.8267
Epoch 6/10
59717/59717 [==============================] - 4s 72us/sample - loss: 0.7925 - accuracy: 0.7918 - val_loss: 0.8132 - val_accuracy: 0.7929
Epoch 7/10
59717/59717 [==============================] - 4s 73us/sample - loss: 0.7916 - accuracy: 0.7925 - val_loss: 0.6749 - val_accuracy: 0.8210
Epoch 8/10
19600/59717 [========>.....................] - ETA: 2s - loss: 0.7475 - accuracy: 0.8011

这是我在 Tensorflow 中对同一网络的实现：

tf.compat.v1.disable_eager_execution()
batch_size = 10
hid_dim = 32
output_dim = 21
features = train_x.shape[1]

x = tf.compat.v1.placeholder(tf.float32, (batch_size, features), name='x')
y = tf.compat.v1.placeholder(tf.int32, (batch_size, ), name='y')

w1 = tf.Variable(tf.compat.v1.random_normal([features, hid_dim]), dtype=tf.float32)
b1 = tf.Variable(tf.compat.v1.random_normal([hid_dim]), dtype=tf.float32)

w2 = tf.Variable(tf.compat.v1.random_normal([hid_dim, output_dim]), dtype=tf.float32)
b2 = tf.Variable(tf.compat.v1.random_normal([output_dim]), dtype=tf.float32)


h1 = tf.nn.relu(tf.matmul(x, w1) + b1)
h2 = tf.matmul(h1, w2) + b2


loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=h2, labels=y))
optimizer = tf.compat.v1.train.AdamOptimizer(0.001).minimize(loss)
pred = tf.nn.softmax(h2)

这是我的培训程序实施。在我的情况下，batch_size 是固定的，因此在每个时期，我将整个数据集逐批提供给网络。我在每批之后计算损失并将其添加到数组中。在每个时期之后，我取时期的批次损失数组的平均值以获得我的整体时期损失：

train_in = np.array(train_x)

train_target = np.array(train_y)
train_target = np.squeeze(train_target)

y_t = train_target

num_of_train_batches = len(train_in)/batch_size
init=tf.compat.v1.global_variables_initializer()
print('TRAIN BATCHES: ', num_of_train_batches) 
epoch_list = []
epoch_losses = [] 
epochs = 50
with tf.compat.v1.Session() as sess:
    sess.run(init)
    print('TRAINING')
    for epoch in range(epochs):
      lt = []
      ft = 0
      tt = 1

      train_losses = []
      print('EPOCH: ', epoch)
      epoch_list.append(epoch)
      # RUN WHOLE SET
      for it in range(int(num_of_train_batches)): #len(x_train)/batch_size
          # OPTIMIZE
          _, batch_loss = sess.run([optimizer, loss], feed_dict={x:train_in[ft*batch_size:tt*batch_size], 
                                                                 y:train_target[ft*batch_size:tt*batch_size]})
          train_losses.append(batch_loss)
          
          ft+=1
          tt+=1

      epoch_losses.append(np.array(train_losses).mean())

      print('EPOCH: ', epoch)
      print('LOSS: ', np.array(train_losses).mean())

TRAIN BATCHES:  2200.0
TRAINING
EPOCH:  0
EPOCH:  0
LOSS:  1370.9271
EPOCH:  1
EPOCH:  1
LOSS:  64.23466
EPOCH:  2
EPOCH:  2
LOSS:  36.015495
EPOCH:  3
EPOCH:  3
LOSS:  30.292429
EPOCH:  4
EPOCH:  4
LOSS:  26.436918
EPOCH:  5
EPOCH:  5
LOSS:  25.689302
EPOCH:  6
EPOCH:  6
LOSS:  23.730627
EPOCH:  7
EPOCH:  7
LOSS:  22.356762
EPOCH:  8
EPOCH:  8
LOSS:  21.81124

我的 Keras 实现仅在使用相同数量的隐藏层和隐藏层大小的 8 个 epoch 后才达到 0.75 损失，但我的 TF 实现即使在 15 个 epoch 之后仍然显示出大于 10 的损失。

有人可以指出为什么会这样吗？我猜这个问题与训练过程有关，而不是与实际的 NN 有关。

欢迎所有建议！

tensorflow - Tensorflow NN 实现不收敛

0 回答 0

Related

Reference