1

我是 tensorflow 的新手,正在尝试建立一个模型来对两类图像进行分类。

验证准确率在 12 个 epoch 后达到 98%(这似乎异常高)。预测时,无论输入图像如何,它总是输出:[[1.]]

加载数据中:

import numpy as np
import os
import cv2
from tqdm import tqdm
import random
import pickle

dataDir = "C:/optimised_dataset"

categories = ["demented", "healthy"]

IMG_WIDTH = 44
IMG_HEIGHT = 52
lim = 0

training_data = []

def create_training_data():
    for category in categories:
        path = os.path.join(dataDir, category)  # path to demented or healthy dir
        class_num = categories.index(category)
        lim = 0
        for img in tqdm(os.listdir(path)):
            if lim < 3000:
                try:
                    img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
                    new_array = cv2.resize(img_array, (IMG_WIDTH, IMG_HEIGHT))
                    training_data.append([new_array, class_num])
                    lim+=1
                except Exception as e:
                    pass
            else:
                break

create_training_data()

random.shuffle(training_data)

X = []
Y = []

for features, label in training_data:
    X.append(features)
    Y.append(label)

X = np.array(X).reshape(-1, IMG_WIDTH, IMG_HEIGHT, 1)
Y = np.array(Y)

pickle_out = open("X.pickle", "wb")
pickle.dump(X, pickle_out)
pickle_out.close()

pickle_out = open("Y.pickle", "wb")
pickle.dump(Y, pickle_out)
pickle_out.close()

模型:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten, Conv2D, MaxPool2D
import pickle
import numpy as np

X = pickle.load(open("X.pickle", "rb"))
Y = pickle.load(open("Y.pickle", "rb"))

X = np.array(X)
X = X/255.0
Y = np.array(Y)

model = Sequential()

model.add(Conv2D(64, (3,3), input_shape=X.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Flatten())

model.add(Dense(64))
model.add(Activation("relu"))

model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss="binary_crossentropy",
              optimizer="adam",
              metrics=['accuracy'])

model.fit(X, Y, batch_size=32, epochs=18, validation_split=0.1)

model.save('DD1.model')

预言:

import cv2
import tensorflow as tf

categories = ["demented", "healthy"]


def prepare(filepath):
    IMG_WIDTH = 44
    IMG_HEIGHT = 52
    img_array = cv2.imread(filepath, cv2.IMREAD_GRAYSCALE)
    img_array = img_array / 255.0
    new_array = cv2.resize(img_array, (IMG_WIDTH, IMG_HEIGHT))
    return new_array.reshape(-1, IMG_WIDTH, IMG_HEIGHT, 1)


model = tf.keras.models.load_model("DD1.model")

prediction = model.predict([prepare('D:/test.png')])

print(prediction)

当我删除img_array = img_array / 255.0它时,它会输出一个介于 0 和 1 之间的看似随机的小数。

4

1 回答 1

0

正如我已经建议的那样,这种情况的原因是大多数情况下的类别不平衡。

假设您有两个类,A 类有 96 个样本,B 类有 4 个样本。在这种极端情况下,如果我们从一个总是预测 A 类的模型开始,它将达到 96% 的准确率。

要解决此问题,您可以尝试 -

  1. 分配班级权重。
from sklearn.utils import class_weight

class_weights = class_weight.compute_class_weight('balanced',
                                                 np.unique(y_train),
                                                 y_train)


model.fit(X_train, y_train, class_weight=class_weights)

  1. 尝试数据增强以增加少数类中的样本数量。

  2. 而不是准确性,使用 f1 分数来评估您的模型。

于 2020-12-09T11:12:28.227 回答