python - 使用 keras 中的图像数据生成器进行图像数据增强

Question

我目前正在从事计算机视觉项目，我想使用图像数据生成器根据相应目录中的类加载我的图像。

我想通过feature_std_normalization

我在创建数据生成器对象时声明了 feature_std_normalization=True 但在训练时它给出了错误：

local/lib/python3.6/dist-packages/keras_preprocessing/image/image_data_generator.py:716：UserWarning：此 ImageDataGenerator 指定featurewise_center，但它不适合任何训练数据。首先调用.fit(numpy_data). warnings.warn('此 ImageDataGenerator 指定 '

dategen.fit ()当图像来自generator.flow_from_directory()原样datagen.fit()使用 X_train但我没有它时如何使用

score 0 · Accepted Answer

如果您使用的是 TensorFlow 2，那么我建议您尝试两种方法：

使用.flow_from_directory(): 正如文档所说，您实际上可以将路径传递到保存图像的目录，然后您的datagen对象就可以传递给model.fit(). 这是我在上面链接的 TensorFlow 文档中提供的示例（为清楚起见，还添加了一些附加注释）：

# Set the augmentations the data generators will do
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
# Instantiate a DirectoryIterator - this yields the batches of data samples + their labels
train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
# Train a Sequential model
model.fit(
        train_generator,
        steps_per_epoch=2000,
        epochs=50,
        validation_data=validation_generator,
        validation_steps=800)

Using tf.data.Dataset.from_generator：如果您想利用tf.dataAPI，并且您的数据集尚未拆分为训练集和测试集，这种方法可能对您更方便。这是它如何工作的示例（来自文档中的不同页面）：

# This example uses an image dataset that has NOT been split into train/test yet
flowers = tf.keras.utils.get_file(
    'flower_photos',
    'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)
# Like before, set the data augmentations
img_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, rotation_range=20)
# (Optional) Double-check the dimensions of a single batch
images, labels = next(img_gen.flow_from_directory(flowers))
print(images.dtype, images.shape)  # float32 (32, 256, 256, 3)
print(labels.dtype, labels.shape)  # float32 (32, 5)
# Now, you can make a dataset with the augmentations
ds = tf.data.Dataset.from_generator(
    lambda: img_gen.flow_from_directory(flowers), 
    output_types=(tf.float32, tf.float32), 
    output_shapes=([32,256,256,3], [32,5])
)

当然，您可能仍然想知道，“我们如何将这个ds变量拆分为训练集和测试集？”

幸运的是，Angel Igareta就这个主题写了一篇很棒的博客文章。下面我将只包含解决我们问题的代码片段：

def get_dataset_partitions_tf(ds, ds_size, train_split=0.8, val_split=0.1, test_split=0.1, shuffle=True, shuffle_size=10000):
    """Credit to Angel Igareta at https://towardsdatascience.com/how-to-split-a-tensorflow-dataset-into-train-validation-and-test-sets-526c8dd29438 for this code."""
    assert (train_split + test_split + val_split) == 1
    
    if shuffle:
        # Specify seed to always have the same split distribution between runs
        ds = ds.shuffle(shuffle_size, seed=12)
    
    train_size = int(train_split * ds_size)
    val_size = int(val_split * ds_size)
    
    train_ds = ds.take(train_size)    
    val_ds = ds.skip(train_size).take(val_size)
    test_ds = ds.skip(train_size).skip(val_size)
    
    return train_ds, val_ds, test_ds

通过这种方式，您将能够将您的数据集传递给model.fit()TensorFlow，而 TensorFlow 基本上会在训练时为您进行数据扩充。

最后但并非最不重要的 - 在你的情况下，我相信你会想要传递featurewise_std_normalization=True给ImageDataGenerator构造函数。如果我错过了您的问题中的某些内容，请告诉我，但我认为实际上并没有为此命名的参数feature_std_normalization。

python - 使用 keras 中的图像数据生成器进行图像数据增强

1 回答 1

Related

Reference