我正在尝试使用 MS-COCO 格式的自定义数据集训练 MaskRCNN 图像分割模型。




“段”:[[140.0,352.5,131.0,351.5,118.0,344.5,101.500000000001,323.0,94.5,303.0,86.5,86.5,292.0 22.0, 179.5, 53.99999999999999, 170.5, 76.0, 158.5, 88.5, 129.0, 100.5, 111.0, 152.0, 70.5, 175.0, 65.5, 217.0, 64.5, 272.0, 48.5, 296.0, 56.49999999999999, 320.5, 82.0, 350.5, 135.0, 374.5, 163.0, 382.5, 190.0, 381.5, 205.99999999999997, 376.5, 217.0, 371.0, 221.5, 330.0, 229.50000000000003, 312.5, 240.0, 310.5, 291.0, 302.5, 310.0, 288.0, 326.5, 259.0, 337.5, 208.0, 339.5, 171.0, 349.5] ],


"bbox": [11.5, 11.5, 341.0, 371.0],


我在这张图片中有一个对象,因此有一个用于分割和 bbox 的项目。分割值是多边形的像素,因此对于不同的对象有不同的大小。



1 回答 1


要管理 COCO 格式的数据集,您可以使用这个 repo。它提供了可以从注释文件中实例化的类,使其非常易于使用和访问数据。


class CocoDataset(torch.utils.data.Dataset):
def __init__(self, dataset_dir, subset, transforms):
    dataset_path = os.path.join(dataset_dir, subset)
    ann_file = os.path.join(dataset_path, "annotation.json")
    self.imgs_dir = os.path.join(dataset_path, "images")
    self.coco = COCO(ann_file)
    self.img_ids = self.coco.getImgIds()
    self.transforms = transforms

def __getitem__(self, idx):
        idx: index of sample to be fed
        dict containing:
        - PIL Image of shape (H, W)
        - target (dict) containing: 
            - boxes:    FloatTensor[N, 4], N being the n° of instances and it's bounding 
            boxe coordinates in [x0, y0, x1, y1] format, ranging from 0 to W and 0 to H;
            - labels:   Int64Tensor[N], class label (0 is background);
            - image_id: Int64Tensor[1], unique id for each image;
            - area:     Tensor[N], area of bbox;
            - iscrowd:  UInt8Tensor[N], True or False;
            - masks:    UInt8Tensor[N, H, W], segmantation maps;
    img_id = self.img_ids[idx]
    img_obj = self.coco.loadImgs(img_id)[0]
    anns_obj = self.coco.loadAnns(self.coco.getAnnIds(img_id)) 

    img = Image.open(os.path.join(self.imgs_dir, img_obj['file_name']))

    # list comprhenssion is too slow, might be better changing it
    bboxes = [ann['bbox'] for ann in anns_obj]
    # bboxes = ? from [x, y, w, h] to [x0, y0, x1, y1]
    masks = [self.coco.annToMask(ann) for ann in anns_obj]
    areas = [ann['area'] for ann in anns_obj]

    boxes = torch.as_tensor(bboxes, dtype=torch.float32)
    labels = torch.ones(len(anns_obj), dtype=torch.int64)
    masks = torch.as_tensor(masks, dtype=torch.uint8)
    image_id = torch.tensor([idx])
    area = torch.as_tensor(areas)
    iscrowd = torch.zeros(len(anns_obj), dtype=torch.int64)

    target = {}
    target["boxes"] = boxes
    target["labels"] = labels
    target["masks"] = masks
    target["image_id"] = image_id
    target["area"] = area
    target["iscrowd"] = iscrowd

    if self.transforms is not None:
        img, target = self.transforms(img, target)
    return img, target

def __len__(self):
    return len(self.img_ids)


于 2021-08-04T03:17:41.297 回答