我在两类 MNIST 数据集上训练具有建设性损失的 Siamese 网络,以识别两个图像是否相似。虽然损失在一开始是减少的,但后来它以 0.5 左右的精度冻结。
该模型在成对的图像和标签上进行训练(0.0 表示不同,1.0 表示相同)。为了简单起见,我只使用了两个类(零和一)并准备了数据集,以便它包含每对图像。我检查了数据集是否一致(来自数据集的图像对)。我还尝试过数据归一化、不同的批量大小、学习率、初始化和正则化常数,但都没有成功。
这是模型:
class Encoder(Model):
"""
A network that finds a 50-dimensional representation of the input images
so that the distances between them minimize the constructive loss
"""
def __init__(self):
super(Encoder, self).__init__(name='encoder')
self.cv = Conv2D(32, (3, 3), activation='relu', padding='Same',
input_shape=(28, 28, 1),
kernel_regularizer=tf.keras.regularizers.l2(0.01))
self.pool = MaxPooling2D((2, 2))
self.flatten = Flatten()
self.dense = Dense(50, activation=None,
kernel_regularizer=tf.keras.regularizers.l2(0.01))
def call(self, inputs, training=None, mask=None):
""" Forward pass for one image """
x = self.cv(inputs)
x = self.pool(x)
x = self.flatten(x)
x = self.dense(x)
return x
@staticmethod
def distance(difference):
""" The D function from the paper which is used in loss """
distance = tf.sqrt(tf.reduce_sum(tf.pow(difference, 2), 0))
return distance
损失和准确率:
def simnet_loss(target, x1, x2):
difference = x1 - x2
distance_vector = tf.map_fn(lambda x: Encoder.distance(x), difference)
loss = tf.map_fn(lambda distance: target * tf.square(distance) +
(1.0 - target) * tf.square(tf.maximum(0.0, 1.0 - distance)), distance_vector)
average_loss = tf.reduce_mean(loss)
return average_loss
def accuracy(y_true, y_pred):
distance_vector = tf.map_fn(lambda x: Encoder.distance(x), y_pred)
accuracy = tf.keras.metrics.binary_accuracy(y_true, distance_vector)
return accuracy
训练:
def train_step(images, labels):
with tf.GradientTape() as tape:
x1, x2 = images[:, 0, :, :, :], images[:, 1, :, :, :]
x1 = model(x1)
x2 = model(x2)
loss = simnet_loss(labels, x1, x2)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return loss
model = Encoder()
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
for epoch in range(n_epoch):
epoch_loss = 0
n_batches = int(x_train.shape[0]/batch_size)
for indices in np.array_split(np.arange(x_train.shape[0]), indices_or_sections=n_batches):
x = np.take(x_train, indices, axis=0)
y = np.take(y_train, indices, axis=0)
epoch_loss += train_step(x, y)
epoch_loss = epoch_loss / n_batches
accuracy = test_step(x_train, y_train)
val_accuracy = test_step(x_test, y_test)
tf.print("epoch:", epoch, "loss:", epoch_loss, "accuracy:", accuracy,
"val_accuracy:", val_accuracy, output_stream=sys.stdout)
上面的代码产生:
纪元:0 损失:0.755419433 准确度:0.318898171 val_accuracy:0.310316473
纪元:1 损失:0.270610392 准确度:0.369466901 val_accuracy:0.360871345
纪元:2 损失:0.262594223 准确度:0.430587918 val_accuracy:0.418002456
纪元:3 损失:0.258690506 准确度:0.428258181 val_accuracy:0.427044809
纪元:4 损失:0.25654456 准确度:0.43497327 val_accuracy:0.44800657
纪元:5 损失:0.255373538 准确度:0.444840342 val_accuracy:0.454993844
纪元:6 损失:0.254594624 准确度:0.453885168 val_accuracy:0.454171807