0

我一直在尝试从https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/pascal.md训练我自己的 deeplab 模型。

我在 Google Colab 上运行一切。

我已经能够很好地训练模型:

%%shell
export PYTHONPATH=$PYTHONPATH:"/content/models/research":"/content/models/research/slim"
NUM_ITERATIONS=50
python3 train.py \
  --logtostderr \
  --train_split="train" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --train_crop_size=200,200 \
  --train_batch_size=12 \
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --fine_tune_batch_norm=true \
  --tf_initial_checkpoint="/content/deeplabv3_pascal_train_aug/model.ckpt.index" \
  --train_logdir="/content/output" \
  --dataset_dir="/content/drive/My Drive/Colab Notebooks/Background Removal/tfrecord"

并很好地创建可视化:

%%shell
export PYTHONPATH=$PYTHONPATH:"/content/models/research":"/content/models/research/slim"
python3 vis.py \
  --logtostderr \
  --vis_split="val" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --vis_crop_size=200,200 \
  --checkpoint_dir=/content/output \
  --vis_logdir=/content/output/vis \
  --dataset_dir="/content/drive/My Drive/Colab Notebooks/Background Removal/tfrecord" \
  --max_number_of_iterations=1

但是运行 export_model.py 不起作用。我认为这可能是我训练的模型的问题,所以我尝试导出我正在训练的初始检查点 - 它也不起作用。

%%shell
export PYTHONPATH=$PYTHONPATH:"/content/models/research":"/content/models/research/slim"
NUM_ITERATIONS=50
python3 export_model.py \
  --logtostderr \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --crop_size=200 \
  --crop_size=200 \
  --checkpoint_path='/content/output/model.ckpt-50.index' \
  --export_path='/content/output'

运行 export_model.py 的完整输出:

WARNING:tensorflow:From /content/models/research/deeplab/core/conv2d_ws.py:40: The name tf.layers.Layer is deprecated. Please use tf.compat.v1.layers.Layer instead.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From export_model.py:201: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

WARNING:tensorflow:From export_model.py:117: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

W0329 17:24:00.753659 139709292058496 module_wrapper.py:139] From export_model.py:117: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From export_model.py:117: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

W0329 17:24:00.753914 139709292058496 module_wrapper.py:139] From export_model.py:117: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From export_model.py:118: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

W0329 17:24:00.754124 139709292058496 module_wrapper.py:139] From export_model.py:118: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

INFO:tensorflow:Prepare to export model to: /content/output
I0329 17:24:00.754279 139709292058496 export_model.py:118] Prepare to export model to: /content/output
WARNING:tensorflow:From export_model.py:91: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0329 17:24:00.755340 139709292058496 module_wrapper.py:139] From export_model.py:91: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

INFO:tensorflow:Exported model performs single-scale inference.
I0329 17:24:00.817728 139709292058496 export_model.py:130] Exported model performs single-scale inference.
WARNING:tensorflow:From /content/models/research/deeplab/model.py:320: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

W0329 17:24:00.818036 139709292058496 module_wrapper.py:139] From /content/models/research/deeplab/model.py:320: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /content/models/research/deeplab/core/feature_extractor.py:461: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0329 17:24:00.818522 139709292058496 deprecation.py:323] From /content/models/research/deeplab/core/feature_extractor.py:461: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From /content/models/research/deeplab/core/feature_extractor.py:75: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W0329 17:24:00.821603 139709292058496 module_wrapper.py:139] From /content/models/research/deeplab/core/feature_extractor.py:75: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0329 17:24:00.825009 139709292058496 deprecation.py:323] From /tensorflow-1.15.2/python3.6/tensorflow_core/contrib/layers/python/layers/layers.py:1057: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /content/models/research/deeplab/core/utils.py:41: The name tf.image.resize_bilinear is deprecated. Please use tf.compat.v1.image.resize_bilinear instead.

W0329 17:24:02.636440 139709292058496 module_wrapper.py:139] From /content/models/research/deeplab/core/utils.py:41: The name tf.image.resize_bilinear is deprecated. Please use tf.compat.v1.image.resize_bilinear instead.

WARNING:tensorflow:From export_model.py:162: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

W0329 17:24:02.986706 139709292058496 module_wrapper.py:139] From export_model.py:162: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

WARNING:tensorflow:From export_model.py:178: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

W0329 17:24:02.991279 139709292058496 module_wrapper.py:139] From export_model.py:178: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From export_model.py:178: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
W0329 17:24:02.991502 139709292058496 deprecation.py:323] From export_model.py:178: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
WARNING:tensorflow:From export_model.py:181: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

W0329 17:24:03.295938 139709292058496 module_wrapper.py:139] From export_model.py:181: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

WARNING:tensorflow:From export_model.py:182: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0329 17:24:03.296255 139709292058496 module_wrapper.py:139] From export_model.py:182: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W0329 17:24:03.419735 139709292058496 deprecation.py:323] From /tensorflow-1.15.2/python3.6/tensorflow_core/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-03-29 17:24:03.901045: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-03-29 17:24:03.919472: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.920276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2020-03-29 17:24:03.920544: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-29 17:24:03.922225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-29 17:24:03.923832: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-29 17:24:03.924132: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-29 17:24:03.926131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-29 17:24:03.927020: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-29 17:24:03.930883: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-29 17:24:03.931017: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.931838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.932481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-03-29 17:24:03.937940: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2020-03-29 17:24:03.938159: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1a83480 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-29 17:24:03.938192: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-03-29 17:24:03.993090: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.993934: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1a83640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-03-29 17:24:03.993966: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2020-03-29 17:24:03.994138: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.994819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2020-03-29 17:24:03.994883: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-29 17:24:03.994912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-29 17:24:03.994937: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-29 17:24:03.994960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-29 17:24:03.994984: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-29 17:24:03.995007: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-29 17:24:03.995031: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-29 17:24:03.995121: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.995850: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.996477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-03-29 17:24:03.996539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-29 17:24:03.998097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-29 17:24:03.998127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-03-29 17:24:03.998140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-03-29 17:24:03.998307: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.999000: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-29 17:24:03.999707: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-03-29 17:24:03.999752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
INFO:tensorflow:Restoring parameters from /content/output/model.ckpt-50.index
I0329 17:24:04.002565 139709292058496 saver.py:1284] Restoring parameters from /content/output/model.ckpt-50.index
Traceback (most recent call last):
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
     [[{{node save/RestoreV2}}]]
  (1) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
     [[{{node save/RestoreV2}}]]
     [[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 1290, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
     [[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
     [[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
     [[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save/RestoreV2':
  File "export_model.py", line 201, in <module>
    tf.app.run()
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "export_model.py", line 178, in main
    saver = tf.train.Saver(tf.all_variables())
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 1300, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 1618, in object_graph_key_mapping
    object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "export_model.py", line 201, in <module>
    tf.app.run()
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "export_model.py", line 192, in main
    initializer_nodes=None)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/tools/freeze_graph.py", line 151, in freeze_graph_with_def_protos
    saver.restore(sess, input_checkpoint)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 1306, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

2 root error(s) found.
  (0) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
     [[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Not found: Tensor name "MobilenetV2/Conv/BatchNorm/beta" not found in checkpoint files /content/output/model.ckpt-50.index
     [[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
     [[save/RestoreV2/_301]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'save/RestoreV2':
  File "export_model.py", line 201, in <module>
    tf.app.run()
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "export_model.py", line 178, in main
    saver = tf.train.Saver(tf.all_variables())
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-14-46a5ede3bd50> in <module>()
----> 1 get_ipython().run_cell_magic('shell', '', 'export PYTHONPATH=$PYTHONPATH:"/content/models/research":"/content/models/research/slim"\nNUM_ITERATIONS=50\npython3 export_model.py \\\n  --logtostderr \\\n  --atrous_rates=6 \\\n  --atrous_rates=12 \\\n  --atrous_rates=18 \\\n  --output_stride=16 \\\n  --crop_size=200 \\\n  --crop_size=200 \\\n  --checkpoint_path=\'/content/output/model.ckpt-50.index\' \\\n  --export_path=\'/content/output\'')

2 frames
/usr/local/lib/python3.6/dist-packages/google/colab/_system_commands.py in check_returncode(self)
    136     if self.returncode:
    137       raise subprocess.CalledProcessError(
--> 138           returncode=self.returncode, cmd=self.args, output=self.output)
    139 
    140   def _repr_pretty_(self, p, cycle):  # pylint:disable=unused-argument

CalledProcessError: Command 'export PYTHONPATH=$PYTHONPATH:"/content/models/research":"/content/models/research/slim"
NUM_ITERATIONS=50
python3 export_model.py \
  --logtostderr \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --crop_size=200 \
  --crop_size=200 \
  --checkpoint_path='/content/output/model.ckpt-50.index' \
  --export_path='/content/output'' returned non-zero exit status 1.

我知道类似的 GitHub 问题(https://github.com/tensorflow/models/issues/6212https://github.com/tensorflow/models/issues/3992),但它看起来不像任何得到解决。我还尝试在 deeplab 中的 export_model.py 代码中四处寻找,但我对 TF 代码了解得不够多,不知道在哪里看。

4

1 回答 1

0

默认情况下,它正在尝试搜索在 MobileNet-v2 主干上训练的模型检查点。但是,当您在 xception 主干上训练您的模型时。请--model_variant="xception_65"在您的export_model.py.

于 2020-07-30T13:08:39.507 回答