c++ - Locality Sensitivy Hashing in OpenCV for image processing

Question

This is my first image processing application, so please be kind with this filthy peasant.

THE APPLICATION:

I want to implement a fast application (performance are crucial even over accuracy) where given a photo (taken by mobile phone) containing a movie poster finds the most similar photo in a given dataset and return a similarity score. The dataset is composed by similar pictures (taken by mobile phone, containing a movie poster). The images can be of different size, resolutions and can be taken from different viewpoints (but there is no rotation, since the posters are supposed to always be right-oriented).

Any suggestion on how to implement such an application is well accepted.

FEATURE DESCRIPTIONS IN OPENCV:

I've never used OpenCV and I've read this tutorial about Feature Detection and Description by OpenCV.

From what I've understood, these algorithms are supposed to find keypoints (usually corners) and eventually define descriptors (which describe each keypoint and are used for matching two different images). I used "eventually" since some of them (eg FAST) provides only keypoints.

MOST SIMILAR IMAGE PROBLEM AND LSH:

The problems above doesn't solve the problem "given an image, how to find the most similar one in a dataset in a fast way". In order to do that, we can both use the keypoints and descriptors obtained by any of the previous algorithms. The problem stated above seems like a nearest neighbor problem and Locality Sensitive Hashing is a fast and popular solution for find an approximate solution for this problem in high-dimensionality spaces.

THE QUESTION:

What I don't understand is how to use the result of any of the previous algorithms (i.e. keypoints and descriptors) in LSH.

Is there any implementation for this problem?

score 2 · Accepted Answer

我将提供一个一般性的答案，超出 OpenCV 库的范围。

引用这个答案：

描述符：它们是比较关键点的方式。他们以向量格式（恒定长度）总结了关键点的一些特征。

话虽如此，我们可以（几何上）将描述符想象/视为D 维空间中的点。所以总的来说，所有的描述符都是 D 维空间中的点。例如，对于GIST，D = 960。

所以实际上描述符描述了图像，使用的信息少于整个图像（因为当你有 10 亿张图像时，大小很重要）。它们充当图像的代表，因此我们代表图像处理它们（因为它们更容易/更小处理）。

您提到的问题是最近邻问题。请注意，当 D 很大时，这个问题的近似版本会导致显着的加速（因为维度的诅咒会使传统方法，例如kd-tree非常慢，几乎与 N（点数）呈线性关系） .

解决作为优化问题的 NN 问题的算法通常是通用的。他们可能不在乎数据是图像、分子等，例如，我对两者都使用了我的kd-GeRaF。因此，算法期望 D 维空间中有 N 个点，因此您可能想说 N 个描述符。

在这里查看我对 LSH 的回答（这指向了一个很好的实现）。

编辑：

LSH 期望输入N个D维向量，并给定一个查询向量（在D中）和一个范围R，将从查询向量中找到位于此范围内的向量。

因此，我们可以说每幅图像仅由一个向量表示，例如SIFT格式。

你看，LSH 实际上并没有直接解决 k-NN 问题，但它在一个范围内搜索（并且可以给你 k-NN，如果它们在这个范围内）。在实验部分，高维近似最近邻中阅读更多关于 R 的信息。kd-GeRaF和FLANN直接解决了 k-NN 问题。

c++ - Locality Sensitivy Hashing in OpenCV for image processing

1 回答 1

Related

Reference