对于我的第一次机器学习体验,我有一个基本的分类要做。
我有 3 个不同的文件夹:
train_path = './dataset/pneumonia/train/'
test_path = './dataset/pneumonia/test/'
val_path = './dataset/pneumonia/val/
每个文件夹:
os.listdir(train_path)
返回
['NORMAL', 'PNEUMONIA']
在每组中:
- 训练集:
- 正常:949
- 肺炎:949
- 测试集:
- 正常:317
- 肺炎:317
- 验证集:
- 正常:317
- 肺炎:317
我使用张量流:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
image_gen = ImageDataGenerator(
rotation_range=10, # rotate the image 10 degrees
width_shift_range=0.10, # Shift the pic width by a max of 5%
height_shift_range=0.10, # Shift the pic height by a max of 5%
rescale=1/255, # Rescale the image by normalzing it.
shear_range=0.1, # Shear means cutting away part of the image (max 10%)
zoom_range=0.1, # Zoom in by 10% max
horizontal_flip=True, # Allow horizontal flipping
fill_mode='nearest' # Fill in missing pixels with the nearest filled value
)
image_gen.flow_from_directory(train_path)
image_gen.flow_from_directory(test_path)
我创建了一个模型(基本模型):
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(512, (3, 3), input_shape=(image_width, image_height, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
# Dropouts help reduce overfitting by randomly turning neurons off during training.
# Here we say randomly turn off 50% of neurons.
model.add(Dropout(0.5))
# Last layer, remember its binary so we use sigmoid
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
然后我训练它:
train_image_gen = image_gen.flow_from_directory(
train_path,
target_size=image_shape[:2],
color_mode='rgb',
batch_size=batch_size,
class_mode='binary'
)
results = model.fit_generator(train_image_gen,epochs=20,
validation_data=test_image_gen,
callbacks=[early_stop, board])
到目前为止,好的结果是正确的:
pred_probabilities = model.predict_generator(test_image_gen)
predictions = pred_probabilities > 0.5
confusion_matrix(test_image_gen.classes,predictions)
我得到了相当好的结果:
我的问题是当我想预测图像时,它返回的结果远非正确:
val_image_gen = image_gen.flow_from_directory(
val_path,
target_size=image_shape[:2],
color_mode='rgb',
class_mode='binary',
)
pred_probabilities = model.predict_generator(val_image_gen)
predictions = pred_probabilities > 0.5
这是我获得的一些输出:
precision recall f1-score support
0 0.51 0.57 0.53 317
1 0.51 0.44 0.47 317
accuracy 0.51 634
macro avg 0.51 0.51 0.50 634
weighted avg 0.51 0.51 0.50 634
该数据集的混淆矩阵如下:
[[180 137]
[176 141]]

