python - TensorFlow：在我自己的图像上进行训练

Question

我是 TensorFlow 的新手。我正在寻找图像识别方面的帮助，我可以在其中训练自己的图像数据集。

有没有训练新数据集的例子？

score 94 · Accepted Answer

如果你对如何在 TensorFlow 中输入自己的数据感兴趣，可以看看这个教程。
我还在这里写了一份斯坦福 CS230 最佳实践指南。

新答案（带`tf.data`）和标签

tf.data通过in的引入r1.4，我们可以创建一批没有占位符和没有队列的图像。步骤如下：

创建一个包含图像文件名和相应标签列表的列表
创建一个tf.data.Dataset阅读这些文件名和标签
预处理数据
从中创建一个迭代器tf.data.Dataset，将产生下一批

代码是：

# step 1
filenames = tf.constant(['im_01.jpg', 'im_02.jpg', 'im_03.jpg', 'im_04.jpg'])
labels = tf.constant([0, 1, 0, 1])

# step 2: create a dataset returning slices of `filenames`
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))

# step 3: parse every image in the dataset using `map`
def _parse_function(filename, label):
    image_string = tf.read_file(filename)
    image_decoded = tf.image.decode_jpeg(image_string, channels=3)
    image = tf.cast(image_decoded, tf.float32)
    return image, label

dataset = dataset.map(_parse_function)
dataset = dataset.batch(2)

# step 4: create iterator and final input tensor
iterator = dataset.make_one_shot_iterator()
images, labels = iterator.get_next()

现在我们可以直接运行sess.run([images, labels])而无需通过占位符提供任何数据。

旧答案（使用 TensorFlow 队列）

总结一下，您有多个步骤：

创建文件名列表（例如：图像的路径）
创建 TensorFlow文件名队列
读取并解码每个图像，将它们调整为固定大小（批处理所必需的）
输出一批这些图像

最简单的代码是：

# step 1
filenames = ['im_01.jpg', 'im_02.jpg', 'im_03.jpg', 'im_04.jpg']

# step 2
filename_queue = tf.train.string_input_producer(filenames)

# step 3: read, decode and resize images
reader = tf.WholeFileReader()
filename, content = reader.read(filename_queue)
image = tf.image.decode_jpeg(content, channels=3)
image = tf.cast(image, tf.float32)
resized_image = tf.image.resize_images(image, [224, 224])

# step 4: Batching
image_batch = tf.train.batch([resized_image], batch_size=8)

score 7 · Accepted Answer

基于@olivier-moindrot 的回答，但对于 Tensorflow 2.0+：

# step 1
filenames = tf.constant(['im_01.jpg', 'im_02.jpg', 'im_03.jpg', 'im_04.jpg'])
labels = tf.constant([0, 1, 0, 1])

# step 2: create a dataset returning slices of `filenames`
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))

def im_file_to_tensor(file, label):
    def _im_file_to_tensor(file, label):
        path = f"../foo/bar/{file.numpy().decode()}"
        im = tf.image.decode_jpeg(tf.io.read_file(path), channels=3)
        im = tf.cast(image_decoded, tf.float32) / 255.0
        return im, label
    return tf.py_function(_im_file_to_tensor, 
                          inp=(file, label), 
                          Tout=(tf.float32, tf.uint8))

dataset = dataset.map(im_file_to_tensor)

如果您遇到类似的问题：

ValueError：无法获取具有未知等级的形状的长度

将 tf.data.Dataset 张量传递给 model.fit 时，请查看https://github.com/tensorflow/tensorflow/issues/24520。对上面代码片段的修复是：

def im_file_to_tensor(file, label):
    def _im_file_to_tensor(file, label):
        path = f"../foo/bar/{file.numpy().decode()}"
        im = tf.image.decode_jpeg(tf.io.read_file(path), channels=3)
        im = tf.cast(image_decoded, tf.float32) / 255.0
        return im, label

    file, label = tf.py_function(_im_file_to_tensor, 
                                 inp=(file, label), 
                                 Tout=(tf.float32, tf.uint8))
    file.set_shape([192, 192, 3])
    label.set_shape([])
    return (file, label)

score 0 · Accepted Answer

2.0 使用 Tensorflow Hub 的兼容答案：Tensorflow Hub是Tensorflow由 Google 提供的用于文本和图像数据集的模型。

它saves Thousands of Hours of Training Time and Computational Effort，因为它重用了现有的预训练模型。

如果我们有一个图像数据集，我们可以从 TF Hub 获取现有的预训练模型，并将其应用于我们的数据集。

使用预训练模型 MobileNet 重新训练我们的图像数据集的代码如下所示：

import itertools
import os

import matplotlib.pylab as plt
import numpy as np

import tensorflow as tf
import tensorflow_hub as hub

module_selection = ("mobilenet_v2_100_224", 224) #@param ["(\"mobilenet_v2_100_224\", 224)", "(\"inception_v3\", 299)"] {type:"raw", allow-input: true}
handle_base, pixels = module_selection
MODULE_HANDLE ="https://tfhub.dev/google/imagenet/{}/feature_vector/4".format(handle_base)
IMAGE_SIZE = (pixels, pixels)
print("Using {} with input size {}".format(MODULE_HANDLE, IMAGE_SIZE))

BATCH_SIZE = 32 #@param {type:"integer"}

#Here we need to Pass our Dataset

data_dir = tf.keras.utils.get_file(
    'flower_photos',
    'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)

model = tf.keras.Sequential([
    hub.KerasLayer(MODULE_HANDLE, trainable=do_fine_tuning),
    tf.keras.layers.Dropout(rate=0.2),
    tf.keras.layers.Dense(train_generator.num_classes, activation='softmax',
                          kernel_regularizer=tf.keras.regularizers.l2(0.0001))
])
model.build((None,)+IMAGE_SIZE+(3,))
model.summary()

图像再训练教程的完整代码可以在这个Github 链接中找到。

有关 TensorFlow Hub 的更多信息，请参阅此TF 博客。

与图像相关的预训练模块可以在这个TF Hub 链接中找到。

所有与图像、文本、视频等相关的预训练模块都可以在这个TF HUB 模块链接中找到。

最后，这是Tensorflow Hub 的基本页面。

score 0 · Accepted Answer

如果您的数据集由子文件夹组成，您可以使用ImageDataGenerator它flow_from_directory有助于从目录加载数据，

train_batches = ImageDataGenerator().flow_from_directory(
    directory=train_path, target_size=(img_height,img_weight), batch_size=32 ,color_mode="grayscale")

文件夹层次结构可以如下，

train 
    -- cat
    -- dog
    -- moneky

python - TensorFlow：在我自己的图像上进行训练

4 回答 4

新答案（带tf.data）和标签

旧答案（使用 TensorFlow 队列）

Related

Reference

新答案（带`tf.data`）和标签