我正在使用 google colab 来训练我的 ssd 模型。这是我的错误的堆栈跟踪:
Traceback (most recent call last):
File "train_ssd_network.py", line 394, in <module>
tf.app.run()
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "train_ssd_network.py", line 390, in main
sync_optimizer=None)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 753, in train
master, start_standard_services=False, config=session_config) as sess:
File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 1014, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 839, in stop
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python3.7/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 1003, in managed_session
start_standard_services=start_standard_services)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/supervisor.py", line 734, in prepare_or_wait_for_session
init_fn=self._init_fn)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/session_manager.py", line 298, in prepare_session
init_fn(sess)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/framework/python/ops/variables.py", line 761, in callback
saver.restore(session, model_path)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 1277, in restore
raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 325, in run
raise File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv)) File "train_ssd_network.py", line 390, in main
sync_optimizer=None) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 796, in train
should_retry = True File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/tf_should_use.py", line 198, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
==================================
E1002 15:10:16.652289 140269098841984 tf_should_use.py:76] ==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 325, in run
raise File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv)) File "train_ssd_network.py", line 390, in main
sync_optimizer=None) File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 796, in train
should_retry = True File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/tf_should_use.py", line 198, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
==================================
我知道该train_ssd_network.py
文件存在问题,但这里的确切问题是什么?
我阅读了 StackOverflow 问题,他们提到这可能是与检查点相关的问题。但是,我确实有一个检查点文件夹,其中包含解ssd_300_vgg.ckpt
压缩的文件,其中还包含两个文件。该文件直接从作者的存储库中下载。
其他答案如下:
该错误只是意味着
tf.train.latest_checkpoin
t 没有找到任何东西。它返回None
,然后 Saver 抱怨,因为它已通过None
。所以该目录中没有检查点。
tf.app.flags.DEFINE_string(
'checkpoint_path', '/content/gdrive/MyDrive/SSD-custom/checkpoint/ssd_300_vgg.ckpt',
'The path to a checkpoint from which to fine-tune.')
tf.app.flags.DEFINE_string(
'checkpoint_model_scope', None,
'Model scope in the checkpoint. None if the same as the trained model.')
tf.app.flags.DEFINE_string(
'checkpoint_exclude_scopes', None,
'Comma-separated list of scopes of variables to exclude when restoring '
'from a checkpoint.')
tf.app.flags.DEFINE_string(
'trainable_scopes', None,
'Comma-separated list of scopes to filter the set of variables to train.'
'By default, None would train all the variables.')
tf.app.flags.DEFINE_boolean(
'ignore_missing_vars', False,
'When restoring a checkpoint would ignore missing variables.')
FLAGS = tf.app.flags.FLAGS
我该如何解决这个问题?