python - Tensorflow“model.evaluate()”每次在同一个数据集上运行都会给出不同的结果

Question

当我在同一验证集中多次在 Tensorflow 中运行 model.evaluate 时，我得到不同的结果。

该模型包括数据增强层、EfficientNetB0 基线和 GlobalAveragePooling 层（见下文）。我正在使用来自数据帧的张量切片的 tf.data 管道加载验证数据集，并且它没有被打乱，因此顺序始终相同。

def get_custom_model(input_shape, saved_model_path=None, training_base_model=True):
    input_layer = Input(shape=input_shape)

    data_augmentation = RandomFlip('horizontal')(input_layer, training=False)
    data_augmentation = RandomRotation(factor=(-0.2, 0.2))(data_augmentation, training=False)
    data_augmentation = RandomZoom(height_factor=(-0.2, 0.2))(data_augmentation, training=False)
    data_augmentation = RandomCrop(width = input_shape[0], height = input_shape[1](data_augmentation, training=False)

    baseline_model = EfficientNetB0(include_top=False, weights='imagenet')
    baseline_model.trainable = training_base_model # Added for bsg hypertuning

    baseline_output = baseline_model(data_augmentation, training=training_base_model)
    baseline_output = GlobalAveragePooling2D()(baseline_output)
    attributes_output = Dense(units=228, activation='sigmoid', name='attributes_output')(baseline_output)

    model = Model(inputs=[input_layer], outputs=[attributes_output])

    # Load weights
    if saved_model_path != None: 
        model.load_weights(saved_model_path)#.expect_partial()        
    
    return model

我知道如果我再次训练模型，确实结果可能会有所不同，因为某些层是用随机权重初始化的，但我希望对同一模型的评估是相等的。我正在使用相同的 saved_model_path 运行 get_custom_model 方法，以便每次模型加载相同的权重（之前保存的）。

我用来比较的不同指标是损失、精度和召回，以防它们相关。优化器是 rmsprop 和损失 BinaryCrossentropy。另外，我尝试将 training_base_model 更改为 False 并且指标要差得多（几乎像随机权重）。

PS：同样在训练期间，我使用相同的验证集来获取验证指标并从中保存最佳权重，但是当我再次加载最佳权重时，结果并不相同。例如，在训练时期的验证期间，我可以获得 81.28% 的精度，然后在加载这些权重并执行 model.evaluate() 时获得 57% 的精度。

python - Tensorflow“model.evaluate()”每次在同一个数据集上运行都会给出不同的结果

0 回答 0

Related

Reference