python - Keras: poor performance with ImageDataGenerator

Question

I try to augment my image data using the Keras ImageDataGenerator. My task is a regression task, where an input image results in another, transformed image. So far so good, works quite well.

Here I wanted to apply data augmentation by using the ImageDataGenerator. In order to transform both images the same way, I used the approach described in the Keras docs, where a transformation of an image with a corresponding mask is described. My case is a little bit different, as my images are already loaded and don't need to be fetched from a directory. This procedure was already described in another StackOverlow post.

To verify my implementation, I first used it without augmentation and using the ImageDataGenerator without any parameter specified. According to the class reference in the Keras docs, this should not alter the images. See this snippet:

img_val = img[0:split_seperator]
img_train = img[split_seperator:]

target_val = target[0:split_seperator]
target_train = target[split_seperator:]

data_gen_args = dict()

# define data preparation
src_datagen = ImageDataGenerator(**data_gen_args)
target_datagen = ImageDataGenerator(**data_gen_args)

# fit parameters from data
seed = 1
src_datagen.fit(img_train, augment=False, seed=seed)
target_datagen.fit(target_train, augment=False, seed=seed)

training_generator = zip(
    src_datagen.flow(img_train, batch_size=batch_size_training, seed=seed),
    target_datagen.flow(target_train, batch_size=batch_size_training, seed=seed))

_ = model.fit_generator(
    generator=training_generator,
    steps_per_epoch=image_train.shape[0] // batch_size_training,
    epochs=num_epochs, verbose=1,
    validation_data=(img_val, target_val), callbacks=callbacks)

Unfortunately, my implementation seems to have some issues. I do not get the performances expected. The validation loss is somehow stable around a certain value and only slightly decreasing (see the image below). Here I expect, as I did not use any augmentation, the same loss as the non-augmented baseline.

In comparison, my training without the ImageDataGenerator looks like

_ = model.fit(img, target,
              batch_size=batch_size_training,
              epochs=num_epochs, verbose=1,
              validation_split=0.2, callbacks=cb)

I guess I got somehow mixed up with the usage of the ImageDataGenerator, the flow and the fit function. So my questions are:

is one of the applied functions fit or flow redundant and causes this behavior?
do I have an implementation problem?
does this implementation in general makes sense?
does it make sense to set the validation set fix or should it be augmented too?

Update (2019-01-23 & cont.): What I have already tried so far (in responses to answers):

creating an generator for the validation data as well
removing the applied fit function
setting shuffle=True in the flow function (data is already shuffled)

Neither of these approaches helped to make the results better.

score 1 · Accepted Answer

Finally I understand what you are trying to do, this should get the job done.

aug = ImageDataGenerator(**data_gen_args)

# train the network
H = model.fit_generator(aug.flow(img_train, target_train, batch_size=image_train.shape[0]),
    validation_data=(img_val, target_val), steps_per_epoch=image_train.shape[0] // BS,
    epochs=EPOCHS)

Let me know if this works.

python - Keras: poor performance with ImageDataGenerator

1 回答 1

Related

Reference