object-detection-api - 从保存的检查点恢复训练的 TF2 对象检测 API 问题

Question

我面临一个似乎在一夜之间发生的 TF2 对象检测 API 问题。我正在尝试从保存的检查点恢复训练，并且像往常一样，我将配置文件中的路径更改为检查点所在的位置，然后再恢复训练，这一直有效。

今天它抛出了这个错误（见下文）。由于某种原因，检查点目录和模型目录不能相同。现在，最大的问题是，如果我更改模型目录，它会从零开始训练，而不是从上一个 epoch 开始，所以我被卡住了。这只发生在 TF2 中，我也尝试过 TF1 并且工作正常。

文件“/usr/local/lib/python3.7/dist-packages/object_detection/utils/variables_helper.py”，第 230 行，在 ensure_checkpoint_supported ('请将 model_dir 设置为不同的路径。'))) RuntimeError: Checkpoint dir ( /content/drive/MyDrive/Object_detection/training) 和 model_dir (/content/drive/MyDrive/Object_detection/training) 不能相同。请将 model_dir 设置为不同的路径。

score 2 · Accepted Answer

“fine_tune_checkpoint”应该指向“pre_trained_model”文件夹中的检查点；
'model_dir' 是您保存新检查点的目录。

无需手动更改文件夹。如果“model_dir”中有任何检查点，则将从该点重新开始训练。如果没有检查点，训练将从“pre_trained_model”文件夹中的检查点开始。

score 0 · Accepted Answer

我遇到了同样的问题。它说 model_dir 和 chechpoint_dir 不能相同，但是，如果它们不同，则培训将从头开始。

这是由于最近（5 月 7 日）在文件“research/object_detection/utils/variables_helper.py”的末尾添加了一项检查：

 if model_dir == checkpoint_path_dir:
    raise RuntimeError(
        ('Checkpoint dir ({}) and model_dir ({}) cannot be same.'.format(
            checkpoint_path_dir, model_dir) +
         (' Please set model_dir to a different path.')))

我设法通过将其更改为以下内容来修复它：

 if model_dir == checkpoint_path_dir:
    pass
    # raise RuntimeError(
        # ('Checkpoint dir ({}) and model_dir ({}) cannot be same.'.format(
            # checkpoint_path_dir, model_dir) +
         # (' Please set model_dir to a different path.')))

在克隆 Github 存储库之后和安装 object_detection 包之前。

我相信您也可以更改克隆版本，例如（可能需要进行一些编辑才能使其正常工作）：

import os
import pathlib

# Clone the tensorflow models repository if it doesn't already exist
if "models" in pathlib.Path.cwd().parts:
  while "models" in pathlib.Path.cwd().parts:
    os.chdir('..')
elif not pathlib.Path('models').exists():
  !git clone --depth 1 https://github.com/tensorflow/models
  !git checkout 'master@{2021-05-6 00:00:00}'

object-detection-api - 从保存的检查点恢复训练的 TF2 对象检测 API 问题

2 回答 2

Related

Reference