python - 将自定义视觉输出转换为 TensorFlow 对象检测 API 可视化？

Question

我使用 Azure Custom Vision 制作了一个模型并将其导出为“Tensorflow - SavedModel”。该模型正在使用导出中包含的帮助程序代码在本地使用。尽管在循环中使用 OpenCV VideoCapture()/Read() 从实时视频馈送捕获中读取，但它略有修改。

我的应用程序对实时视频源进行了很好的检测，因为我可以看到正确输出到控制台的结果，但是我无法获得准确的边界框以在输出视频流上正确显示。控制台输出显示了来自 Azure 自定义视觉模型预测的结果数组，我可以看到边界框坐标数组，看起来像是规范化的值。

在使用 Azure 自定义视觉之前，我能够使用“模型动物园”中的现有模型，并且对象检测 API python 可视化助手将在提要的显示屏上正确显示边界框。

但是，从 Azure 自定义视觉返回的坐标似乎与默认 COCO SSD 模型返回的坐标“不同”？

我需要将从 Azure 自定义视觉返回的边界框坐标转换为 Tensorflow 对象检测 API 可视化帮助程序可以理解的值。

使用对象检测 API 和 COCO SSD 模型的原始代码（有效！）：

output_dict = run_inference_for_single_image(image_resize, graph)
                #Visualization of the results of a detection.
                vis_util.visualize_boxes_and_labels_on_image_array(
                    image_np,
                    output_dict['detection_boxes'],
                    output_dict['detection_classes'],
                    output_dict['detection_scores'],
                    category_index,
                    instance_masks=output_dict.get('detection_masks'),
                    use_normalized_coordinates=True,
                    line_thickness=2)
                cv2.imshow('object_detection', cv2.resize(image_np, (640, 480)))

Azure 自定义视觉版本未正确显示框：

    image_resize = cv2.resize(image_np, (512, 512))

    predictions = od_model.predict_image(Image.fromarray(image_resized))

    if len(predictions) > 0:
        print(predictions)                    
        output_dict = {}
        output_dict['detection_boxes'] = []
        output_dict['detection_boxes'] = [???]  <-- Populate with compatible shape!!!?????
        output_dict['detection_scores'] = np.asarray([ sub['probability'] for sub in predictions ])                
        output_dict['detection_classes'] = np.asarray([ sub['tagId'] for sub in predictions ])
        output_dict['detection_class_names'] = np.asarray([ sub['tagName'] for sub in predictions ])
        vis_util.visualize_boxes_and_labels_on_image_array(image_np,
              output_dict['detection_boxes'],
              output_dict['detection_classes'],
              output_dict['detection_scores'],
              category_index,
              instance_masks=output_dict.get('detection_masks'),
              use_normalized_coordinates=True,
              line_thickness=2)

-->  Console showing Custom Vision Model Response:
         {'probability': 0.95146583, 'tagId': 1, 'tagName': 'MyObject', 'boundingBox': {'left': 0.11083871, 'top': 0.65143364, 'width': 0.05332406, 'height': 0.04930339}}
         {'probability': 0.92589812, 'tagId': 0, 'tagName': 'OtherObject', 'boundingBox': {'left': 0.24750886, 'top': 0.68784532, 'width': 0.54308632, 'height': 0.17839652}}

使用 Azure 自定义视觉模型，我似乎无法正确显示边界框。我能够将自定义视觉“boundingBox”转换为可视化预期的相同形状，但该框永远不会位于正确的坐标中。我原以为可能是因为 COCO SSD 返回张量和 Custom Vision 预测响应张量中的坐标系计算方式不同？或者也许两个形状之间的坐标顺序不同？

有人已经解决了这个翻译吗？我做错了吗？提前致谢！

score 0 · Accepted Answer

好吧，回答我自己的问题......能够通过蛮力得到这个......现在我所有的自定义视觉响应都转换为 COCO SSD Tensorflow 对象检测 API 输出。它归结为两件事：

创建与从自定义视觉门户“导出”导出的 .txt 文件的顺序匹配的兼容 .pbtxt 文件。我正在使用 SavedModel... 请注意，虽然 .pbtxt 文件是基于 1 的索引开始，而自定义视觉模型中返回的 tagId 是基于零的。这很容易解决，只需在分配给字典中检测到的类键时向 tagId 添加一个“+1”，然后让 TF OD API 通过将人类可读的标签分配给实时提要上的边界框来完成其余的工作。
边界框坐标功夫……！在我将自定义视觉 API 的结果转换为与原始 TF 模型匹配的“数组数组”并计算 xmax 和 ymax 之后，我发现对象检测 API 可视化位最终转换为形状中的“框” xmin, ymin, xmax, ymax 的顺序（例如在您的转换中不要尝试 xmin, xmax, ymin, ymax）。自定义视觉 API 响应返回其边界框的 xmin、ymin、宽度和高度。我在这里的几个答案中看到了这一点，但通常是在其他情况下。默认情况下，ImageNet/Resnet 模型也会返回不同的形状。在这里不回答这个问题，因为我目前不需要它，但我确信如果需要，类似的蛮力方法将适用于该模型。

无论如何......一些代码......

        ret, image_np = cap.read()
        if image_np is None:
            continue

        predictions = od_model.predict_image(Image.fromarray(image_np))
        
        if len(predictions) > 0:                
            output_dict = {}            
            output_dict['detection_boxes'] = []

            output_dict['detection_scores'] = np.asarray([ sub['probability'] for sub in predictions ])                
            output_dict['detection_classes'] = np.asarray([ sub['tagId'] for sub in predictions ]) **+ 1**
            output_dict['detection_class_names'] = np.asarray([ sub['tagName'] for sub in predictions ])

            for p in predictions:
                print(p)        #for debugging purposes...            
                box_left = p['boundingBox']['left']
                box_top = p['boundingBox']['top']
                box_height = p['boundingBox']['height'] 
                box_width = p['boundingBox']['width']
                output_dict['detection_boxes'].append(np.asarray(( box_top, box_left, box_top+box_height, box_left + box_width)))
                
            output_dict['detection_boxes'] = np.asarray(output_dict['detection_boxes'])

            vis_util.visualize_boxes_and_labels_on_image_array(
                    image_np,
                    output_dict['detection_boxes'],
                    output_dict['detection_classes'],
                    output_dict['detection_scores'],
                    category_index,
                    min_score_thresh=0.949,                        
                    instance_masks=output_dict.get('detection_masks'),
                    use_normalized_coordinates=True,
                    line_thickness=2)
        cv2.imshow('object_detection', cv2.resize(image_np, (VID_W, VID_H)))

python - 将自定义视觉输出转换为 TensorFlow 对象检测 API 可视化？

1 回答 1

Related

Reference