image-processing - 计算机视觉的真实数据收集和评估

Question

目前我开始开发一个涉及人类跟踪的计算机视觉应用程序。我想为将在这个项目中录制的视频构建真实的元数据。元数据可能需要手动标记，主要包括图像中人类的位置。我想使用元数据来评估我的算法的性能。

我当然可以使用例如 qt 和/或 opencv 构建标签工具，但我想知道是否有某种事实上的标准。我遇到了毒蛇，但它似乎已经死了，而且它并不像我希望的那样容易。除此之外，我没有发现太多。

这里有人对标签和评估使用哪种软件/标准/方法有一些建议吗？我的主要偏好是选择面向 c++ 的东西，但这不是硬性限制。

提前致以诚挚的问候和感谢！汤姆

score 5 · Accepted Answer

我又看了一下vatic并让它工作。它是一种在线视频注释工具，旨在通过商业服务进行众包，并在 Linux 上运行。但是，也有离线模式。在这种模式下，不需要用于开发该软件的服务，并且该软件独立运行。

随附的 README 文件中对安装进行了详细描述。其中包括设置 appache 和 mysql 服务器、一些 python 包、ffmpeg。如果您遵循自述文件，这并不难。（我提到我的代理有一些问题，但这与这个软件包无关）。

您可以尝试在线演示。默认输出是这样的：

0 302 113 319 183 0 1 0 0 "person"
0 300 112 318 182 1 1 0 1 "person"
0 298 111 318 182 2 1 0 1 "person"
0 296 110 318 181 3 1 0 1 "person"
0 294 110 318 181 4 1 0 1 "person"
0 292 109 318 180 5 1 0 1 "person"
0 290 108 318 180 6 1 0 1 "person"
0 288 108 318 179 7 1 0 1 "person"
0 286 107 317 179 8 1 0 1 "person"
0 284 106 317 178 9 1 0 1 "person"

每行包含 10+ 列，以空格分隔。这些列的定义是：

1   Track ID. All rows with the same ID belong to the same path.
2   xmin. The top left x-coordinate of the bounding box.
3   ymin. The top left y-coordinate of the bounding box.
4   xmax. The bottom right x-coordinate of the bounding box.
5   ymax. The bottom right y-coordinate of the bounding box.
6   frame. The frame that this annotation represents.
7   lost. If 1, the annotation is outside of the view screen.
8   occluded. If 1, the annotation is occluded.
9   generated. If 1, the annotation was automatically interpolated.
10  label. The label for this annotation, enclosed in quotation marks.
11+ attributes. Each column after this is an attribute.

但也可以提供 xml、json、pickle、labelme 和 pascal voc 的输出

所以，总而言之，这完全符合我的要求，而且它也很容易使用。不过，我仍然对其他选择感兴趣！

score 3 · Accepted Answer

LabelMe是另一个开放的注释工具。我认为它不太适合我的特殊情况，但仍然值得一提。它似乎面向blob标签。

score 2 · Accepted Answer

这是所有计算机视觉从业者都面临的问题。如果你是认真的，有一家公司会通过众包为你做这件事。不过，我不知道是否应该在此站点中放置指向它的链接。

score 1 · Accepted Answer

我在寻找用于图像注释的工具来构建用于训练图像分析模型的地面实况数据集时遇到了同样的问题。

如果您需要为注释绘制多边形轮廓，LabelMe 是一个不错的选择。我以前使用过它，它做得很好，并且在 3d 特征提取方面还有一些很酷的功能。除了 LabelMe，我还做了一个开源工具LabelD。如果您仍在寻找进行注释的工具，请查看！

image-processing - 计算机视觉的真实数据收集和评估

4 回答 4

Related

Reference