2

这是这个 Github issue的后续问题。长话短说,我尝试将 Tensorflow 对象检测 API 与我自己的数据集一起使用。一切正常,直到突然崩溃并显示以下错误消息:

...
INFO:tensorflow:global step 10635: loss = 0.3392 (0.822 sec/step)
INFO:tensorflow:global step 10636: loss = 0.3529 (0.823 sec/step)
INFO:tensorflow:global step 10637: loss = 0.3305 (0.831 sec/step)
2017-09-14 20:02:02.545415: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,240,127,4]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,240,127,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape_5, SparseToDense_1, Shape_2, Merge_1, Shape, Merge_2, Shape_3, SparseToDense_5, Shape_8, SparseToDense_3, Shape_6, Cast_1, Shape_1, Cast_2, Shape_7, ExpandDims_5, Shape_4, Reshape_5, Shape_10, Reshape_6, Shape_9)]]
INFO:tensorflow:global step 10638: loss = 0.3599 (0.858 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "train.py", line 198, in <module>
    tf.app.run()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 194, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\trainer.py", line 296, in train
    saver=saver)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 767, in train
    sv.stop(threads, close_summary_writer=True)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\client\session.py", line 1235, in _single_operation_run
    target_list_as_strings, status, None)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\Lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,240,127,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape_5, SparseToDense_1, Shape_2, Merge_1, Shape, Merge_2, Shape_3, SparseToDense_5, Shape_8, SparseToDense_3, Shape_6, Cast_1, Shape_1, Cast_2, Shape_7, ExpandDims_5, Shape_4, Reshape_5, Shape_10, Reshape_6, Shape_9)]]

G:\Tensorflow_section\models-master\object_detection>

起初我以为我的数据集中可能有一些不一致的图像,即有一些 png 保存为 jpg,反之亦然,所以我去扫描数据集中的所有图像并修复它们。我对这样的任务使用了以下方法:

private string CheckImagetype(Stream stream)
{
    string jpg = "FFD8";
    string bmp = "424D" ;
    string gif = "474946" ;
    string png = "89504E470D0A1A0A" ;
    string sig = "";

    stream.Seek(0, SeekOrigin.Begin);
    for (int i = 0; i < 8; i++)
    {
        sig += stream.ReadByte().ToString("X2");
        if (sig.Length == 4 && sig == jpg)
        {
            sig = "jpg";
            break;
        }
        else if(sig.Length == 4 && sig == bmp)
        {
            sig = "bmp";
            break;
        }
        else if (sig.Length == 6 && sig == gif)
        {
            sig = "gif";
            break;
        }
        else if (sig.Length == 16 && sig == png)
        {
            sig = "png";
            break;
        }
    }
    return sig;
}

然后我用于EmguCV检索图像深度/通道数,以避免从错误的深度引起任何进一步的问题!然后注释图像 abdTFRecord再次创建一个新的,然后开始一个新的培训课程。

这是我再次得到的:

INFO:tensorflow:global step 1286: loss = 0.3639 (0.721 sec/step)
INFO:tensorflow:global step 1287: loss = 0.3752 (0.735 sec/step)
INFO:tensorflow:global step 1288: loss = 0.5850 (0.720 sec/step)
2017-09-16 00:11:15.037646: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,150,178,4]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,150,178,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape, SparseToDense, Shape_1, Merge_1, Shape_10, Merge_2, Shape_2, SparseToDense_5, Shape_8, SparseToDense_2, Shape_7, Cast_1, Shape_6, Cast_2, Shape_4, ExpandDims_5, Shape_3, Reshape_5, Shape_5, Reshape_6, Shape_9)]]
INFO:tensorflow:global step 1289: loss = 0.4018 (0.781 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "train.py", line 198, in <module>
    tf.app.run()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 194, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\trainer.py", line 296, in train
    saver=saver)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 767, in train
    sv.stop(threads, close_summary_writer=True)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\client\session.py", line 1235, in _single_operation_run
    target_list_as_strings, status, None)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\Lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,150,178,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape, SparseToDense, Shape_1, Merge_1, Shape_10, Merge_2, Shape_2, SparseToDense_5, Shape_8, SparseToDense_2, Shape_7, Cast_1, Shape_6, Cast_2, Shape_4, ExpandDims_5, Shape_3, Reshape_5, Shape_5, Reshape_6, Shape_9)]]

G:\Tensorflow_section\models-master\object_detection>

我使用了我的图像的一个随机子集(10K 图像而不是 300K)并再次得到相同的错误:

INFO:tensorflow:global step 2316: loss = 0.6428 (2.192 sec/step)
INFO:tensorflow:Recording summary at step 2316.
INFO:tensorflow:global step 2317: loss = 0.4036 (1.444 sec/step)
INFO:tensorflow:global step 2318: loss = 0.4111 (1.343 sec/step)
INFO:tensorflow:global step 2319: loss = 0.3914 (1.351 sec/step)
INFO:tensorflow:global step 2320: loss = 0.3794 (1.376 sec/step)
INFO:tensorflow:global step 2321: loss = 0.4056 (1.340 sec/step)
2017-09-16 20:03:42.148318: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,182,322,4]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,182,322,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape_1, SparseToDense_2, Shape_7, Merge_1, Shape_2, Merge_2, Shape_8, SparseToDense, Shape_6, SparseToDense_5, Shape_10, Cast_1, Shape_4, Cast_2, Shape_9, ExpandDims_5, Shape_5, Reshape_5, Shape, Reshape_6, Shape_3)]]
INFO:tensorflow:global step 2322: loss = 0.4787 (1.391 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "train.py", line 198, in <module>
    tf.app.run()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 194, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\trainer.py", line 296, in train
    saver=saver)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 767, in train
    sv.stop(threads, close_summary_writer=True)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\client\session.py", line 1235, in _single_operation_run
    target_list_as_strings, status, None)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\Lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,182,322,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape_1, SparseToDense_2, Shape_7, Merge_1, Shape_2, Merge_2, Shape_8, SparseToDense, Shape_6, SparseToDense_5, Shape_10, Cast_1, Shape_4, Cast_2, Shape_9, ExpandDims_5, Shape_5, Reshape_5, Shape, Reshape_6, Shape_3)]]

G:\Tensorflow_section\models-master\object_detection>

问题是,我的数据集中根本没有错误消息中报告的形状的任何图像。

以下是一些补充信息:

  • 操作系统平台和分发:Windows 10 x64 1703, Build 15063.540
  • TensorFlow 安装自(源代码或二进制文件):binary (used pip install )
  • TensorFlow 版本(使用下面的命令):1.3.0
  • 蟒蛇版本:3.5.3
  • CUDA/cuDNN 版本:Cuda 8.0 /cudnn v6.0
  • GPU型号和内存:GTX-1080 - 8G
4

1 回答 1

6

TL;DR:
仅使用 JPEG。

更长的解释:
似乎在创建TFRecords时,只支持 JPEG 图像,并且在文档中没有任何地方指出!

此外,当您尝试使用其他类型时,它不会发出任何警告或不会引发任何异常,因此像我这样的人会浪费大量时间来调试一些很容易被发现和修复的东西。无论如何,将所有图像转换为 JPEG 解决了这个奇怪的问题。

于 2017-09-18T03:19:46.447 回答