3

我正在研究一个基于cifar10数据集的小型项目。我已经从图像增强技术中加载数据tfds.load(...)并练习。

当我使用tf.data.Dataset对象,这是我的数据集时,实时数据增强是非常无法实现的,因此我想将所有功能传递到tf.keras.preprocessing.image.ImageDataGenerator.flow(...)以获得实时增强的功能。

但是这个flow(...)方法接受与tf.data.Dataset对象没有任何关系的 NumPy 数组。

有人可以在这方面(或任何替代方法)指导我吗?我该如何进一步进行?

tf.image转换是实时的吗?如果没有,除了 ,还有什么最好的方法ImageDataGenerator.flow(...)

我的代码:

import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.preprocessing.image import ImageDataGenerator

splitting = tfds.Split.ALL.subsplit(weighted=(70, 20, 10))
dataset_cifar10, dataset_info = tfds.load(name='cifar10', 
                                          split=splitting, 
                                          as_supervised=True, 
                                          with_info=True)

train_dataset, valid_dataset, test_dataset = dataset_cifar10

BATCH_SIZE = 32

train_dataset = train_dataset.batch(batch_size=BATCH_SIZE)
train_dataset = train_dataset.prefetch(buffer_size=1)

image_generator = ImageDataGenerator(rotation_range=45, 
                                     width_shift_range=0.15, 
                                     height_shift_range=0.15, 
                                     zoom_range=0.2, 
                                     horizontal_flip=True, 
                                     vertical_flip=True, 
                                     rescale=1./255)

train_dataset_generator = image_generator.flow(...)

...
4

2 回答 2

2

在拆分训练和测试数据集后,您可以迭代数据集并附加到一个列表中,您可以将其与 ImageDataGenerator 一起使用。一个完整的用例如下:

cifar10_data, cifar10_info = tfds.load("cifar10", with_info=True, as_supervised=True)
train_data, test_data = cifar10_data['train'], cifar10_data['test']
NUM_CLASSES = 10

train_x = []
train_y = []
for sample in train_data:
    train_x.append(sample[0].numpy())
    train_y.append(tf.keras.utils.to_categorical(sample[1].numpy(), num_classes=NUM_CLASSES))

train_x = np.asarray(train_x)
train_y = np.asarray(train_y)

# DataGenerator
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    horizontal_flip=True)

# Fitting train_x data
datagen.fit(train_x)

# Testing
EPOCHS = 1
BATCH_SIZE = 16
for e in range(EPOCHS):
    for batch_x, batch_y in datagen.flow(train_x, train_y, batch_size=BATCH_SIZE):
        print(batch_x, batch_y)
        # Manually needs to break loop
于 2019-11-29T07:50:17.337 回答
0
import tensorflow as tf
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
from tensorflow.keras.preprocessing.image import ImageDataGenerator

splits = ['train[:70%]', 'train[70%:90%]', 'train[90%:]']
BATCH_SIZE = 64
dataset_cifar10, dataset_info = tfds.load(name='cifar10', 
                                          split=splits, 
                                          as_supervised=True, 
                                          with_info=True,
                                          batch_size=BATCH_SIZE)

train_dataset, valid_dataset, test_dataset = dataset_cifar10

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=45, 
    width_shift_range=0.15, 
    height_shift_range=0.15, 
    zoom_range=0.2, 
    horizontal_flip=True, 
    vertical_flip=True, 
    rescale=1./255)

# custom function to wrap image data generator with raw dataset
def tfds_imgen(ds, imgen, batch_size, num_batches):
    for images, labels in ds.batch(batch_size=batch_size).prefetch(buffer_size=1):
        flow = imgen.flow(images, labels, batch_size=batch_size)
        for _ in range(num_batches):
            yield next(flow)
# call the custom function to get the augmented data generator
train_dataset_generator = tfds_imgen(
    train_dataset.as_numpy_iterator(), 
    image_generator,
    batch_size=32,
    num_batches=BATCH_SIZE // 32
)       
于 2020-08-05T05:07:28.433 回答