我正在训练一个关于 Faster R CNN 架构的模型。对于第一个会话,我使用了以下配置:
def get_train_cfg(config_file_path, checkpoint_url, train_dataset_name, test_dataset_name, num_classes, device, output_dir):
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file(config_file_path))
cfg.MODEL_WEIGHTS = model_zoo.get_checkpoint_url(checkpoint_url)
cfg.DATASETS.TRAIN = (train_dataset_name,)
cfg.DATASETS.TEST = (train_dataset_name,)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.SOLVER.IMS_PER_BATCH = 1
cfg.SOLVER.BASE_LR = 0.0001
cfg.SOLVER.MAX_ITER = 16000
cfg.SOLVER.STEPS = []
cfg.MODEL.ROI_HEADS.NUM_CLASSES = num_classes
cfg.MODEL.DEVICE = device
cfg.OUTPUT_DIR = output_dir
return cfg
我想继续我的训练。我有last_checkpoint
, metrics.json
,cfg.pickle
和model_final.pth
.
这是我 笔记本的链接
训练应该从 16001 次迭代开始,16000 次迭代的总损失约为 0.8。但是从 0 到 16000 次迭代,学习率没有从 0.0001 变化。当我通过 resume_or_load(resume=True) 继续训练时,显示以下错误
[12/04 05:54:36 d2.data.datasets.coco]: Loaded 381 images in COCO format from ../input/cascade-rcnn/train.json
[12/04 05:54:36 d2.data.build]: Removed 1 images with no usable annotations. 380 images left.
[12/04 05:54:36 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[12/04 05:54:36 d2.data.build]: Using training sampler TrainingSampler
[12/04 05:54:36 d2.data.common]: Serializing 380 elements to byte tensors and concatenating them all ...
[12/04 05:54:36 d2.data.common]: Serialized dataset takes 2.23 MiB
[12/04 05:54:37 d2.engine.hooks]: Loading scheduler from state_dict ...
[12/04 05:54:37 d2.engine.train_loop]: Starting training from iteration 16000
[12/04 05:54:37 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
[12/04 05:54:37 d2.data.datasets.coco]: Loaded 381 images in COCO format from ../input/cascade-rcnn/train.json
[12/04 05:54:38 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[12/04 05:54:38 d2.data.common]: Serializing 381 elements to byte tensors and concatenating them all ...
[12/04 05:54:38 d2.data.common]: Serialized dataset takes 2.23 MiB
WARNING [12/04 05:54:38 d2.engine.defaults]: No evaluator found. Use DefaultTrainer.test(evaluators=), or implement its build_evaluator method.
[12/04 05:54:38 d2.utils.events]: iter: 16001 lr: N/A max_mem: 1627M
表明
lr: N/A
这是为什么?
我在用,
- 蟒蛇:3.7.10,
- Detectron2:0.6,
- 手电筒:1.9.1cu101