我正在尝试仅使用 Tensorflow 实现一个简单的前馈神经网络,并且它没有收敛。我不确定问题出在网络架构还是训练过程实施中。使用 Keras 构建的简单 2 层 NN 似乎融合得很好:
from keras.layers import LSTM, Dense, Flatten, Conv1D
from keras import Sequential
model = Sequential()
model.add(Dense(32, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(21, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(np.array(train_in), np.array(train_target), epochs=10, validation_split=0.1, batch_size=16)
Epoch 2/10 59717/59717 [==============================] - 4s 71us/sample - loss: 1.4021 - accuracy: 0.6812 - val_loss: 1.1049 - val_accuracy: 0.7066
Epoch 3/10
59717/59717 [==============================] - 4s 70us/sample - loss: 1.0942 - accuracy: 0.7321 - val_loss: 1.2269 - val_accuracy: 0.7015
Epoch 4/10
59717/59717 [==============================] - 4s 70us/sample - loss: 0.9096 - accuracy: 0.7654 - val_loss: 0.8207 - val_accuracy: 0.7905
Epoch 5/10
59717/59717 [==============================] - 4s 70us/sample - loss: 0.8373 - accuracy: 0.7790 - val_loss: 0.6863 - val_accuracy: 0.8267
Epoch 6/10
59717/59717 [==============================] - 4s 72us/sample - loss: 0.7925 - accuracy: 0.7918 - val_loss: 0.8132 - val_accuracy: 0.7929
Epoch 7/10
59717/59717 [==============================] - 4s 73us/sample - loss: 0.7916 - accuracy: 0.7925 - val_loss: 0.6749 - val_accuracy: 0.8210
Epoch 8/10
19600/59717 [========>.....................] - ETA: 2s - loss: 0.7475 - accuracy: 0.8011
这是我在 Tensorflow 中对同一网络的实现:
tf.compat.v1.disable_eager_execution()
batch_size = 10
hid_dim = 32
output_dim = 21
features = train_x.shape[1]
x = tf.compat.v1.placeholder(tf.float32, (batch_size, features), name='x')
y = tf.compat.v1.placeholder(tf.int32, (batch_size, ), name='y')
w1 = tf.Variable(tf.compat.v1.random_normal([features, hid_dim]), dtype=tf.float32)
b1 = tf.Variable(tf.compat.v1.random_normal([hid_dim]), dtype=tf.float32)
w2 = tf.Variable(tf.compat.v1.random_normal([hid_dim, output_dim]), dtype=tf.float32)
b2 = tf.Variable(tf.compat.v1.random_normal([output_dim]), dtype=tf.float32)
h1 = tf.nn.relu(tf.matmul(x, w1) + b1)
h2 = tf.matmul(h1, w2) + b2
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=h2, labels=y))
optimizer = tf.compat.v1.train.AdamOptimizer(0.001).minimize(loss)
pred = tf.nn.softmax(h2)
这是我的培训程序实施。在我的情况下,batch_size 是固定的,因此在每个时期,我将整个数据集逐批提供给网络。我在每批之后计算损失并将其添加到数组中。在每个时期之后,我取时期的批次损失数组的平均值以获得我的整体时期损失:
train_in = np.array(train_x)
train_target = np.array(train_y)
train_target = np.squeeze(train_target)
y_t = train_target
num_of_train_batches = len(train_in)/batch_size
init=tf.compat.v1.global_variables_initializer()
print('TRAIN BATCHES: ', num_of_train_batches)
epoch_list = []
epoch_losses = []
epochs = 50
with tf.compat.v1.Session() as sess:
sess.run(init)
print('TRAINING')
for epoch in range(epochs):
lt = []
ft = 0
tt = 1
train_losses = []
print('EPOCH: ', epoch)
epoch_list.append(epoch)
# RUN WHOLE SET
for it in range(int(num_of_train_batches)): #len(x_train)/batch_size
# OPTIMIZE
_, batch_loss = sess.run([optimizer, loss], feed_dict={x:train_in[ft*batch_size:tt*batch_size],
y:train_target[ft*batch_size:tt*batch_size]})
train_losses.append(batch_loss)
ft+=1
tt+=1
epoch_losses.append(np.array(train_losses).mean())
print('EPOCH: ', epoch)
print('LOSS: ', np.array(train_losses).mean())
TRAIN BATCHES: 2200.0
TRAINING
EPOCH: 0
EPOCH: 0
LOSS: 1370.9271
EPOCH: 1
EPOCH: 1
LOSS: 64.23466
EPOCH: 2
EPOCH: 2
LOSS: 36.015495
EPOCH: 3
EPOCH: 3
LOSS: 30.292429
EPOCH: 4
EPOCH: 4
LOSS: 26.436918
EPOCH: 5
EPOCH: 5
LOSS: 25.689302
EPOCH: 6
EPOCH: 6
LOSS: 23.730627
EPOCH: 7
EPOCH: 7
LOSS: 22.356762
EPOCH: 8
EPOCH: 8
LOSS: 21.81124
我的 Keras 实现仅在使用相同数量的隐藏层和隐藏层大小的 8 个 epoch 后才达到 0.75 损失,但我的 TF 实现即使在 15 个 epoch 之后仍然显示出大于 10 的损失。
有人可以指出为什么会这样吗?我猜这个问题与训练过程有关,而不是与实际的 NN 有关。
欢迎所有建议!