2

我想按如下方式转换我的 coco JSON 文件:

带有注释的 CSV 文件应每行包含一个注释。具有多个边界框的图像应该每个边界框使用一行。请注意,像素值的索引从 0 开始。每行的预期格式为:

path/to/image.jpg,x1,y1,x2,y2,class_name

一个完整的例子:

*/data/imgs/img_001.jpg,837,346,981,456,cow 
/data/imgs/img_002.jpg,215,312,279,391,cat
/data/imgs/img_002.jpg,22,5,89,84,bird

这定义了一个包含 3 个图像的数据集:img_001.jpg包含一头奶牛,img_002.jpg包含一只猫和一只鸟,并且img_003.jpg不包含有趣的对象/动物。

我怎么能那样做?

4

2 回答 2

8

我有这样的功能。

def convert_coco_json_to_csv(filename):
    import pandas as pd
    import json
    
    # COCO2017/annotations/instances_val2017.json
    s = json.load(open(filename, 'r'))
    out_file = filename[:-5] + '.csv'
    out = open(out_file, 'w')
    out.write('id,x1,y1,x2,y2,label\n')

    all_ids = []
    for im in s['images']:
        all_ids.append(im['id'])

    all_ids_ann = []
    for ann in s['annotations']:
        image_id = ann['image_id']
        all_ids_ann.append(image_id)
        x1 = ann['bbox'][0]
        x2 = ann['bbox'][0] + ann['bbox'][2]
        y1 = ann['bbox'][1]
        y2 = ann['bbox'][1] + ann['bbox'][3]
        label = ann['category_id']
        out.write('{},{},{},{},{},{}\n'.format(image_id, x1, y1, x2, y2, label))

    all_ids = set(all_ids)
    all_ids_ann = set(all_ids_ann)
    no_annotations = list(all_ids - all_ids_ann)
    # Output images without any annotations
    for image_id in no_annotations:
        out.write('{},{},{},{},{},{}\n'.format(image_id, -1, -1, -1, -1, -1))
    out.close()

    # Sort file by image id
    s1 = pd.read_csv(out_file)
    s1.sort_values('id', inplace=True)
    s1.to_csv(out_file, index=False)
于 2020-06-29T19:45:53.587 回答
0

这是我用来将 Coco 格式转换为 AutoML CSV 格式的函数,用于图像对象检测注释数据:

def convert_coco_json_to_csv(filename,bucket):
    import pandas as pd
    import json
    
    s = json.load(open(filename, 'r'))
    out_file = filename[:-5] + '.csv'

    with open(out_file, 'w') as out:
      out.write('GCS_FILE_PATH,label,X_MIN,Y_MIN,,,X_MAX,Y_MAX,,\n')
      file_names = [f"{bucket}/{image['file_name']}" for image in s['images']]
      categories = [cat['name'] for cat in s['categories']]
      for label in s['annotations']:
        #The COCO bounding box format is [top left x position, top left y position, width, height]. 
        # for AutoML: For example, a bounding box for the entire image is expressed as (0.0,0.0,,,1.0,1.0,,), or (0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0).
        HEIGHT = s['images'][label['image_id']]['height']
        WIDTH = s['images'][label['image_id']]['width']
        X_MIN = label['bbox'][0]/WIDTH
        X_MAX = (label['bbox'][0] + label['bbox'][2]) / WIDTH
        Y_MIN = label['bbox'][1] / HEIGHT
        Y_MAX = (label['bbox'][1] + label['bbox'][3]) / HEIGHT
        out.write(f"{file_names[label['image_id']]},{categories[label['category_id']]},{X_MIN},{Y_MIN},,,{X_MAX},{Y_MAX},,\n")


只需使用文件名和上传图像的 gs 存储调用函数即可使用它:

convert_coco_json_to_csv("/content/train_annotations.coco.json", "gs://[bucket name]")
于 2022-02-27T11:21:13.163 回答