1

对于我的第一次机器学习体验,我有一个基本的分类要做。

我有 3 个不同的文件夹:

train_path = './dataset/pneumonia/train/'
test_path = './dataset/pneumonia/test/'
val_path = './dataset/pneumonia/val/

每个文件夹:

os.listdir(train_path)

返回

['NORMAL', 'PNEUMONIA']

在每组中:

  • 训练集:
    • 正常:949
    • 肺炎:949
  • 测试集:
    • 正常:317
    • 肺炎:317
  • 验证集:
    • 正常:317
    • 肺炎:317

我使用张量流:

from tensorflow.keras.preprocessing.image import ImageDataGenerator
image_gen = ImageDataGenerator(
    rotation_range=10,        # rotate the image 10 degrees
    width_shift_range=0.10,   # Shift the pic width by a max of 5%
    height_shift_range=0.10,  # Shift the pic height by a max of 5%
    rescale=1/255,            # Rescale the image by normalzing it.
    shear_range=0.1,          # Shear means cutting away part of the image (max 10%)
    zoom_range=0.1,           # Zoom in by 10% max
    horizontal_flip=True,     # Allow horizontal flipping
    fill_mode='nearest'       # Fill in missing pixels with the nearest filled value
)


image_gen.flow_from_directory(train_path)
image_gen.flow_from_directory(test_path)

我创建了一个模型(基本模型):

model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(256, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(512, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(512, activation='relu'))

# Dropouts help reduce overfitting by randomly turning neurons off during training.
# Here we say randomly turn off 50% of neurons.
model.add(Dropout(0.5))

# Last layer, remember its binary so we use sigmoid
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

然后我训练它:

train_image_gen = image_gen.flow_from_directory(
    train_path,
    target_size=image_shape[:2],
    color_mode='rgb',
    batch_size=batch_size,
    class_mode='binary'
)

results = model.fit_generator(train_image_gen,epochs=20,
                              validation_data=test_image_gen,
                             callbacks=[early_stop, board])

到目前为止,好的结果是正确的:

pred_probabilities = model.predict_generator(test_image_gen)
predictions = pred_probabilities > 0.5
confusion_matrix(test_image_gen.classes,predictions)

我得到了相当好的结果:

在此处输入图像描述 在此处输入图像描述

我的问题是当我想预测图像时,它返回的结果远非正确:

val_image_gen = image_gen.flow_from_directory(
    val_path,
    target_size=image_shape[:2],
    color_mode='rgb',
    class_mode='binary',
)

pred_probabilities = model.predict_generator(val_image_gen)
predictions = pred_probabilities > 0.5

这是我获得的一些输出:

              precision    recall  f1-score   support

           0       0.51      0.57      0.53       317
           1       0.51      0.44      0.47       317

    accuracy                           0.51       634
   macro avg       0.51      0.51      0.50       634
weighted avg       0.51      0.51      0.50       634

该数据集的混淆矩阵如下:

[[180 137]
 [176 141]]
4

2 回答 2

0

我更换了:

val_image_gen = image_gen.flow_from_directory(
    val_path,
    target_size=image_shape[:2],
    color_mode='rgb',
    class_mode='binary',
)

经过:

val_image_gen = image_gen.flow_from_directory(
    val_path,
    target_size=image_shape[:2],
    color_mode='rgb',
    batch_size=batch_size,
    class_mode='binary',
    shuffle=False
)

我得到了很好的结果:

[[ 269 48]

[ 3 314 ]]

于 2020-05-04T19:57:43.147 回答
0

您的代码存在一些问题:

  • 您正在使用测试集进行验证,并使用验证集进行测试。这可能是一个问题,也可能不是,这取决于您的数据及其拆分方式。
  • 增强应该只应用于训练集。使用单独的实例ImageDataGenerator(rescale=1/255)进行测试和验证。

您的测试结果看起来像是来自未经训练的模型。检查model您正在运行测试的对象是否与您正在训练的对象相同。您可能希望使用model.save()load_model()函数在训练后保留模型权重。

于 2020-05-04T19:40:59.640 回答