1

使用 Python3.6、TF 1.15、imblearn 0.0

我有一个不平衡的数据集,3 个类,两个是偶数,一个是低的。我正在尝试将 SMOTE 应用于数据集,但是,我正在使用目录中的流,我发现我可以使用 next(train_generator) 从数据生成器中获取 X_train 和 y_train。

问题是我的生成器似乎只向 y_train 输出一个类。如果我使用 ravel 它会给我以下错误:

    Found 22089 images belonging to 3 classes.
Found 2136 images belonging to 3 classes.
Found 792 images belonging to 3 classes.
Traceback (most recent call last):
  File ".py", line 93, in <module>
    X_train_smote, y_train_smote = smote.fit_sample(X_train.reshape(X_train.shape[0], -1), y_train.ravel())
  File ".virtualenvs\TF15_Environment-fh5Z3l1i\lib\site-packages\imblearn\base.py", line 77, in fit_resample
    X, y, binarize_y = self._check_X_y(X, y)
  File ".virtualenvs\TF15_Environment-fh5Z3l1i\lib\site-packages\imblearn\base.py", line 135, in _check_X_y
    X, y, reset=True, accept_sparse=accept_sparse
  File ".virtualenvs\TF15_Environment-fh5Z3l1i\lib\site-packages\sklearn\base.py", line 432, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File ".virtualenvs\TF15_Environment-fh5Z3l1i\lib\site-packages\sklearn\utils\validation.py", line 72, in inner_f
    return f(**kwargs)
  File ".virtualenvs\TF15_Environment-fh5Z3l1i\lib\site-packages\sklearn\utils\validation.py", line 812, in check_X_y
    check_consistent_length(X, y)
  File ".virtualenvs\TF15_Environment-fh5Z3l1i\lib\site-packages\sklearn\utils\validation.py", line 256, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [2, 6]
(2, 224, 224, 3)
(2, 3)

Process finished with exit code 1

如果我只是在没有 .ravel() 的情况下加入 y_train 我得到这个:

Found 22089 images belonging to 3 classes.
Found 2136 images belonging to 3 classes.
Found 792 images belonging to 3 classes.
(2, 224, 224, 3)
(2, 3)
Traceback (most recent call last):
  File ".py", line 93, in <module>
    X_train_smote, y_train_smote = smote.fit_sample(X_train.reshape(X_train.shape[0], -1), y_train)
  File ".virtualenvs\TF15_Environment-fh5Z3l1i\lib\site-packages\imblearn\base.py", line 80, in fit_resample
    self.sampling_strategy, y, self._sampling_type
  File ".virtualenvs\TF15_Environment-fh5Z3l1i\lib\site-packages\imblearn\utils\_validation.py", line 533, in check_sampling_strategy
    " Got {} class instead".format(np.unique(y).size)
ValueError: The target 'y' needs to have more than 1 class. Got 1 class instead

这是我的代码,感谢任何建议!谢谢 :)

import datetime
import numpy as np
import cv2
import tensorflow as tf
from tensorflow.keras import backend as k
from tensorflow.keras.applications.mobilenet import MobileNet
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.mobilenet import preprocess_input
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Dense
from tensorflow.keras.callbacks import TensorBoard, EarlyStopping
from imblearn.over_sampling import SMOTE

smote = SMOTE()

k.clear_session()
tf.set_random_seed(42)
np.random.seed(42)

currentDay = datetime.date.today()
now = datetime.datetime.now()
t = now.strftime("%H-%M-%S")

NAME = f'{currentDay}_{t}new_model_001.h5'

tboard = TensorBoard(log_dir=f'logs\\{NAME}',
                     update_freq="epoch",
                     histogram_freq=1,
                     write_grads=True,
                     write_graph=True,
                     )

# config
img_width = 224
img_height = 224
INPUT_DEPTH = 3
input_shape = (img_height, img_width, INPUT_DEPTH)
TRAIN_DATA_DIR = 'dataset/train/'
VALIDATION_DATA_DIR = 'dataset/validation/'
TESTING_DATA_DIR = 'dataset/test/'
MODEL_DIR = 'h5_Models/'
EPOCHS = 500
PATIENCE = 25
BATCH_SIZE = 2

MODEL_NAME = 'new_model_001.h5'

train_datagen = ImageDataGenerator(
    rescale=1/255,
    # zca_whitening=True,
    # zca_epsilon=0.1,
    # rotation_range=5,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=(0.95, 0.95),
    # data_format='channels_last',
    horizontal_flip=True,
    # vertical_flip=True,
    fill_mode='nearest'
    )

validation_datagen = ImageDataGenerator(rescale=1/255)
test_datagen = ImageDataGenerator(rescale=1/255)

train_generator = train_datagen.flow_from_directory(
    TRAIN_DATA_DIR,
    # color_mode='grayscale',
    target_size=(img_height, img_width),
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    shuffle=True)

validation_generator = validation_datagen.flow_from_directory(
    VALIDATION_DATA_DIR,
    # color_mode='grayscale',
    target_size=(img_height, img_width),
    batch_size=BATCH_SIZE,
    class_mode='categorical')

testing_generator = test_datagen.flow_from_directory(
    TESTING_DATA_DIR,
    # color_mode='grayscale',
    target_size=(img_height, img_width),
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    )

X_train, y_train = next(train_generator)

print(X_train.shape)
print(y_train.shape)
X_train_smote, y_train_smote = smote.fit_sample(X_train.reshape(X_train.shape[0], -1), y_train.ravel())
print(X_train_smote.count)
X_train_smote = X_train_smote.reshape(X_train_smote.shape[0], 224, 224, 3)
4

1 回答 1

0

当您使用 时next(train_generator),您只是在考虑单个批次的训练数据集,其中某些批次可能只有一个类别的图像。但是,如果要正确应用 SMOTE,则应考虑整个数据集或代表所有类并在分布上匹配的样本。

于 2021-03-19T10:38:34.013 回答