我是 gluon 的新手,我决定运行这些示例来熟悉编码风格(几年前我使用过 keras,这种混合风格让我有点困惑)。
我的问题是我可以运行这些示例,但是在成功执行此示例中的每个单元格(它是一个 jupyter 笔记本)之后,我上传了一个外部图像并且网络似乎无法检测到任何对象。我在 02 上粘贴了相同的单元格。使用预训练的 Faster RCNN 模型进行预测,并且预训练的网络在检测图像上的每个人时没有问题,所以在我看来,示例中的模型没有被正确训练。
这有发生在其他人身上吗?
我错过了什么吗?
先感谢您!
(顺便说一句,我尝试取消注释训练循环的第 32 行(带有 utograd.backward 的行),更改同一循环的中断限制但没有运气)
链接
我在执行原始示例和下面的单元格时遇到了这个问题。
02) https://gluon-cv.mxnet.io/build/examples_detection/demo_faster_rcnn.html
06) https://gluon-cv.mxnet.io/build/examples_detection/train_faster_rcnn_voc.html
我的测试图像
单元格检测图像上的对象
short, max_size = 600, 800
RCNN_transform = presets.rcnn.FasterRCNNDefaultTrainTransform(short, max_size)
myImg = 'unnamed.jpg'
x, img = data.transforms.presets.rcnn.load_test(myImg)
box_ids, scores, bboxes = net(x)
ax = utils.viz.plot_bbox(img, bboxes[0], scores[0], box_ids[0], class_names=net.classes)
plt.show()
系统信息(如果相关)
我正在使用我的个人电脑,我也在使用 google colab,结果相同,但以防万一......
操作系统:Ubuntu 18.04
硬件
$ hwinfo --short
cpu:
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
graphics card:
nVidia GM107M [GeForce GTX 960M]
Intel HD Graphics 530
英伟达驱动
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 960M Off | 00000000:02:00.0 Off | N/A |
| N/A 41C P5 N/A / N/A | 665MiB / 4046MiB | 23% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2560 G /usr/lib/xorg/Xorg 308MiB |
| 0 2921 G /usr/bin/gnome-shell 132MiB |
| 0 3741 G ...quest-channel-token=7390050445218241480 31MiB |
| 0 5455 G ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files 176MiB |
+-----------------------------------------------------------------------------+
CUDA
$nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
MxNet 和 Gluon 安装
$ pip install mxnet-cu102mkl
$ pip install --upgrade mxnet-cu102mkl gluoncv
编辑 我一直在对训练循环进行修改,这就是我到目前为止所做的。第三个 for 循环之后的第一行代码只是将数据存储在 GPU 上。
#net.hybridize()
epochs = 50
for epoch in range(epochs):
print("epoch: ", epoch,"---------------------------------")
batch_size = 10
for ib, batch in enumerate(train_loader):
#print(ib)
if ib > 500:
break
for dataa, label, rpn_cls_targets, rpn_box_targets, rpn_box_masks in zip(*batch):
dataa = dataa.as_in_context(mx.gpu(0))
label = label.as_in_context(mx.gpu(0)).expand_dims(0)
rpn_cls_targets = rpn_cls_targets.as_in_context(mx.gpu(0))
rpn_box_targets = rpn_box_targets.as_in_context(mx.gpu(0))
rpn_box_masks = rpn_box_masks.as_in_context(mx.gpu(0))
gt_label = label[:, :, 4:5]
gt_box = label[:, :, :4]
with autograd.record():
# network forward
cls_preds, box_preds, roi, samples, matches, rpn_score, rpn_box, anchors, cls_targets, box_targets, box_masks, _ = net(dataa.expand_dims(0), gt_box, gt_label)
# losses of rpn
rpn_score = rpn_score.squeeze(axis=-1)
num_rpn_pos = (rpn_cls_targets >= 0).sum()
rpn_loss1 = rpn_cls_loss(rpn_score, rpn_cls_targets,rpn_cls_targets >= 0) * rpn_cls_targets.size / num_rpn_pos
rpn_loss2 = rpn_box_loss(rpn_box, rpn_box_targets,rpn_box_masks) * rpn_box.size / num_rpn_pos
# losses of rcnn
num_rcnn_pos = (cls_targets >= 0).sum()
rcnn_loss1 = rcnn_cls_loss(cls_preds, cls_targets,cls_targets >= 0) * cls_targets.size / cls_targets.shape[0] / num_rcnn_pos
rcnn_loss2 = rcnn_box_loss(box_preds, box_targets, box_masks) * box_preds.size / box_preds.shape[0] / num_rcnn_pos
# some standard gluon training steps:
autograd.backward([rpn_loss1, rpn_loss2, rcnn_loss1, rcnn_loss2])
trainer.step(batch_size)
我对培训师有疑问,我在其他示例中发现了这一点,但我不确定这是否适用于这种情况。
trainer = gluon.Trainer(net.collect_params(), 'sgd',{'learning_rate': 0.01, 'wd': 0.05, 'momentum': 0.9})
编辑
这是我一直在处理的 .ipynb 文件的副本(google-colab 版本) https://drive.google.com/file/d/1WevimDyTP1lvq_A0OBRMgC-PH8pK4iBv/view?usp=sharing