1

我正在使用以下内容:

我已经获取了这个 repo 并将其转换为使用 Kitti 数据。为此,我在数据集中添加了一个新的 Kitti 类并完成了必要的转换。测试和评估都使用 PASCAL VOC 中的以下类集:

self._classes = (
    '__background__',  # always index 0
    'aeroplane',
    'bicycle',
    'bird',
    'boat',
    'bottle',
    'bus',
    'car',
    'cat',
    'chair',
    'cow',
    'diningtable',
    'dog',
    'horse',
    'motorbike',
    'person',
    'pottedplant',
    'sheep',
    'sofa',
    'train',
    'tvmonitor')

我已将课程设置更改为:

self._classes = (
    'dontcare',  # always index 0
    'pedestrian',
    'car',
    'truck',
    'cyclist')

#-----------------------------
N.B.: Classes should NOT matter here, as the result out of the backbone is simply a featureset, not a classification
#-----------------------------

在看似随机的图像中(将这些“问题”图像从训练集中取出似乎会改变程序在哪个图像上失败),训练代码似乎会从 region-proposal-network 产生 NaN。我有点不知道为什么。

  • 尝试将归一化更改为 Kitti 特定的归一化值
  • 尝试将图像大小调整为 224x224
  • 尝试将归一化数字除以平均标准偏差

    -----------------

    网络定义

    -----------------

    self.conv1 = conv3x3(inplanes, planes, stride) self.bn1 = norm_layer(planes) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(planes, planes) self.bn2 = norm_layer(planes) self.downsample = 下采样 self.stride = stride

    self._layers['head'] = nn.Sequential(self.resnet.conv1, self.resnet.bn1, self.resnet.relu,self.resnet.maxpool, self.resnet.layer1, self.resnet.layer2,self .resnet.layer3)

    self.rpn_net = nn.Conv2d(self._net_conv_channels, cfg.RPN_CHANNELS, [3, 3], padding=1)

    -----------------

    准备图像

    -----------------

    self._image = torch.from_numpy(image.transpose([0, 3, 1, 2])).to(self._device) self.net.train_step(blob, self.optimizer)

    -----------------

    计算图

    -----------------

    (1) self.forward(blob['data'], blobs['im_info'], blobs['gt_boxes']) (2) rois, cls_prob, bbox_pred = self._predict() (3) net_conv = self._image_to_head () (4) net_conv = self._layers'head' (5) rpn = F.relu(self.rpn_net(net_conv))

    ------------------

    解决问题的有用函数

    ------------------

    def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1): """3x3 卷积与填充""" return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=dilation ,组=组,偏差=假,膨胀=膨胀)

    def conv1x1(in_planes, out_planes, stride=1): """1x1 卷积""" return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)

我不知道为什么会发生这种情况,但显然我希望 ResNet101 骨干网中有实数。可能不得不切换到vgg16。

(3) 的输出

tensor([[[[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
...,
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan]],

...,

[[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
...,
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan]],

[[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
...,
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan],
[nan, nan, nan,  ..., nan, nan, nan]]]], device='cuda:0'

有谁知道这里发生了什么?

4

1 回答 1

0

解决了。VOC Pascal(与此 github 存储库一起使用的原始数据集)的像素位置起始索引值为 1[1 到 ymax],其中 Kitti 像素从 0[0 到 ymax-1] 开始。

需要从边界框目标生成中删除 -1。

于 2019-08-29T01:45:10.523 回答