0

我在两类 MNIST 数据集上训练具有建设性损失的 Siamese 网络,以识别两个图像是否相似。虽然损失在一开始是减少的,但后来它以 0.5 左右的精度冻结。

该模型在成对的图像和标签上进行训练(0.0 表示不同,1.0 表示相同)。为了简单起见,我只使用了两个类(零和一)并准备了数据集,以便它包含每对图像。我检查了数据集是否一致(来自数据集的图像对)。我还尝试过数据归一化、不同的批量大小、学习率、初始化和正则化常数,但都没有成功。

这是模型:

class Encoder(Model):
    """
    A network that finds a 50-dimensional representation of the input images
    so that the distances between them minimize the constructive loss
    """

    def __init__(self):
        super(Encoder, self).__init__(name='encoder')

        self.cv = Conv2D(32, (3, 3), activation='relu', padding='Same',
                         input_shape=(28, 28, 1),
                         kernel_regularizer=tf.keras.regularizers.l2(0.01))
        self.pool = MaxPooling2D((2, 2))
        self.flatten = Flatten()
        self.dense = Dense(50, activation=None,
                           kernel_regularizer=tf.keras.regularizers.l2(0.01))

    def call(self, inputs, training=None, mask=None):
        """ Forward pass for one image """
        x = self.cv(inputs)
        x = self.pool(x)
        x = self.flatten(x)
        x = self.dense(x)
        return x

    @staticmethod
    def distance(difference):
        """ The D function from the paper which is used in loss """
        distance = tf.sqrt(tf.reduce_sum(tf.pow(difference, 2), 0))
        return distance

损失和准确率:

def simnet_loss(target, x1, x2):
    difference = x1 - x2
    distance_vector = tf.map_fn(lambda x: Encoder.distance(x), difference)
    loss = tf.map_fn(lambda distance: target * tf.square(distance) +
                                      (1.0 - target) * tf.square(tf.maximum(0.0, 1.0 - distance)), distance_vector)
    average_loss = tf.reduce_mean(loss)
    return average_loss

def accuracy(y_true, y_pred):
    distance_vector = tf.map_fn(lambda x: Encoder.distance(x), y_pred)
    accuracy = tf.keras.metrics.binary_accuracy(y_true, distance_vector)
    return accuracy

训练:

def train_step(images, labels):
    with tf.GradientTape() as tape:
        x1, x2 = images[:, 0, :, :, :], images[:, 1, :, :, :]
        x1 = model(x1)
        x2 = model(x2)
        loss = simnet_loss(labels, x1, x2)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    return loss

model = Encoder()
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

for epoch in range(n_epoch):
    epoch_loss = 0
    n_batches = int(x_train.shape[0]/batch_size)
    for indices in np.array_split(np.arange(x_train.shape[0]), indices_or_sections=n_batches):
        x = np.take(x_train, indices, axis=0)
        y = np.take(y_train, indices, axis=0)
        epoch_loss += train_step(x, y)

    epoch_loss = epoch_loss / n_batches
    accuracy = test_step(x_train, y_train)
    val_accuracy = test_step(x_test, y_test)
    tf.print("epoch:", epoch, "loss:", epoch_loss, "accuracy:", accuracy,
             "val_accuracy:", val_accuracy, output_stream=sys.stdout)

上面的代码产生:

纪元:0 损失:0.755419433 准确度:0.318898171 val_accuracy:0.310316473

纪元:1 损失:0.270610392 准确度:0.369466901 val_accuracy:0.360871345

纪元:2 损失:0.262594223 准确度:0.430587918 val_accuracy:0.418002456

纪元:3 损失:0.258690506 准确度:0.428258181 val_accuracy:0.427044809

纪元:4 损失:0.25654456 准确度:0.43497327 val_accuracy:0.44800657

纪元:5 损失:0.255373538 准确度:0.444840342 val_accuracy:0.454993844

纪元:6 损失:0.254594624 准确度:0.453885168 val_accuracy:0.454171807

4

0 回答 0