video-intelligence-api - 人脸检测模型返回空字典（Google Cloud Video Intelligence）

Question

我在使用 Google Video Intelligence API 的人脸检测模型时遇到问题。

我正在使用Python 3.6.5, 和google-cloud-videointelligence==1.15.0.

有时我会收到来自人脸检测模型的错误响应。我正在解析来自 API 的响应，方法是使用google.protobuf.json_format.MessageToDict(). 我预计会发生以下两种行为之一：

A. 如果视频中出现人脸，我希望结果在 key 下'FaceDetectionAnnotations'，并采用字典的形式；外部字典的键是“段号”（整数），内部字典看起来像这样：

{'coordinates': {'left': 0.3432,
   'top': 0.075,
   'right': 0.6667,
   'bottom': 0.7435},
  'labels': {'confidence': 1.0,
   'attributes': [{'name': 'glasses', 'confidence': 0.041921083},
    {'name': 'headwear', 'confidence': 0.10601594},
    {'name': 'eyes_visible', 'confidence': 0.9976739},
    {'name': 'mouth_open', 'confidence': 0.005100015},
    {'name': 'looking_at_camera', 'confidence': 0.9647807},
    {'name': 'smiling', 'confidence': 0.017670842}]}}

B. 如果视频中没有人脸，我希望结果中的任何地方都没有这样'FaceDetectionAnnotations'的关键。

但是，偶尔我会看到第三种响应，其中'FaceDetectionAnnotations'键存在于结果中（表明人脸检测模型确实检测到了人脸），但是每个内部字典都是空的。每个段仍然有一个内部字典，但它们不包含任何常用信息，例如段的开始和结束时间，或任何坐标或置信度值。

我只在有面孔的视频中看到这个问题。

我可以确认此问题存在于 Google VI 的原始响应中（在使用MessageToDict()函数解析之前，我不确定是什么原因造成的。下面是展示此问题的示例视频的链接。

https://drive.google.com/file/d/1gsbe20iWp6lD9dH0PNvxvvQFUeB5F_cz/view?usp=sharing

如果有人以前见过这样的事情，或者知道如何解决这个问题，我将不胜感激。

score 1 · Accepted Answer

目前，这里有一个关于您关注的未解决问题。工程团队正在研究它，您可以按照上面链接的线程跟踪其进度。

video-intelligence-api - 人脸检测模型返回空字典（Google Cloud Video Intelligence）

1 回答 1

Related

Reference