我目前正在尝试使用 GTSDB 数据集训练 Faster RCNN Inception V2 模型(使用 COCO 预训练)。我有 FullIJCNN 数据集,我将数据集分为训练、验证和测试三部分。最后,我分别创建了 3 个不同的 csv 文件,然后为train和validation创建了 TFRecord 文件。另一方面,我有一个代码块,可以读取每个图像的真实框坐标,并在图像上的交通标志周围绘制框。它也正确地写了类标签。这里有几个例子。同样,这些框不是由网络预测的。他们通过一个函数手动绘制。
然后我使用数据集文件夹中包含的 README 文件创建了一个标签文件,并将0 背景行添加到 labels.txt 的第一行以使其与我的代码一起使用(我认为这是一件愚蠢的事情),因为它是抛出索引错误。但是,我的 .pbtxt 文件中没有“背景”的键以使其从 1 开始。最后我配置了faster_rcnn_inception_v2_coco.config文件,更改num_classes: 90
为num_classes: 43
因为数据集有 43 个类,num_examples: 5000
因为num_examples: 186
我已经将数据集划分为186 个测试示例。num_steps: 200000
照原样使用。最后,我通过运行开始了培训工作
python object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${MODEL_DIR} \
--num_train_steps=50000 \
--num_eval_steps=2000 \
--alsologtostderr
命令,这是回溯(对不起代码块,我不知道如何专门添加日志):
import matplotlib; matplotlib.use('Agg') # pylint: disable=multiple-statements
WARNING:tensorflow:Estimator's model_fn (<function model_fn at 0x7fc4cd6a4938>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/models/research/object_detection/core/box_predictor.py:407: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py:2037: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
WARNING:tensorflow:From /home/models/research/object_detection/core/losses.py:317: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See @{tf.nn.softmax_cross_entropy_with_logits_v2}.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2018-07-26 09:48:21.785041: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-07-26 09:48:21.923329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 9b2f:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-07-26 09:48:21.923382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-26 09:48:22.153991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-26 09:48:22.154053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-07-26 09:48:22.154075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-07-26 09:48:22.154333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 9b2f:00:00.0, compute capability: 3.7)
2018-07-26 09:58:31.794649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-26 09:58:31.794723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-26 09:58:31.794747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-07-26 09:58:31.794765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-07-26 09:58:31.794884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 9b2f:00:00.0, compute capability: 3.7)
WARNING:tensorflow:Ignoring ground truth with image id 2066941970 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 2066941970 since it was previously added
WARNING:tensorflow:Ignoring ground truth with image id 2013299735 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 2013299735 since it was previously added
WARNING:tensorflow:Ignoring ground truth with image id 1416415107 since it was previously added
它产生了很多这样的警告:
WARNING:tensorflow:Ignoring ground truth with image id 2013299735 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 2013299735 since it was previously added
这些消息的原因num_examples
已设置为2000
尽管我的原始配置文件有num_examples: 186
. 我不明白为什么它要创建一个具有不同参数的新配置文件。然而,在充满这些消息的整个日志之后,它会给出一个报告,但我不能确定这到底是想告诉我什么。这是报告:
creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.07s).
Accumulating evaluation results...
DONE (t=0.02s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
最后,我检查了 Tensorboard 以确保它训练正确,但我看到的结果令人沮丧。这是我的模型(损失)的 Tensorboard 图的屏幕截图:
我觉得我做错了什么。我不知道这是否是一个具体的问题,但我试图尽可能详细地提供。
我的问题是:我应该在这些步骤中进行哪些更改?为什么我的函数绘制了真正的盒子,但我的模型无法弄清楚发生了什么?提前致谢!