0

我正在尝试使用 kmeans 函数对关键点(使用 SIFT 检测到)进行聚类,但我无法让它准备好使用。

使用以下代码将关键点保存到 xml/yml 文件中:

int _tmain(int argc, _TCHAR* argv[])
{
Mat img;

img = imread("c:\\box.png", 0);

SiftFeatureDetector detector;
vector<KeyPoint> keypoints;
detector.detect(img, keypoints);



FileStorage fs("keypoint1.xml", FileStorage::WRITE);
write(fs, "keypoints1", keypoints);
fs.release();

xml 保存用空格分隔的关键点,yml 使用逗号分隔关键点。

%YAML:1.0
keypoints1: [ 6.1368021965026855e+000, 5.2649226188659668e+000,
    4.0740542411804199e+000, 2.7943280029296875e+002, 0., 9109760, -1,
    6.1368021965026855e+000, 5.2649226188659668e+000,
    4.0740542411804199e+000, 3.4678604125976562e+002, 0., 9109760, -1,
    1.5903041076660156e+002, 2.4698186874389648e+001,
    4.1325736045837402e+000, 9.7977493286132813e+001, 0., 10158336, -1,
    1.6808378601074219e+002, 2.5029441833496094e+001,
    4.2399377822875977e+000, 9.7380126953125000e+001, 0., 11993344, -1,
    1.9952423095703125e+002, 4.4663669586181641e+001,
    5.0049328804016113e+000, 5.7439949035644531e+001, 0., 7275008, -1,
    3.0947158813476563e+002, 4.6865818023681641e+001,......................

kmeans函数要求每个样本输入一行。有人可以解释一下吗,我的意思是上述文件可以用作单行吗?它是否符合使用 FileStorage 读取方法的 kmeans 条件?

我想添加和读取文件的原因是假设我有 100 个关键点需要聚类的图像。我想将所有这些文件附加到一个巨大的文件中并将其聚集在一起。

谢谢

4

3 回答 3

2

Tom is correct. Usually with SIFT, you use more than one keypoint per image. That is the whole idea behind keypoint detection, that you try to process only the "interesting" parts of the image in the further steps.

So for clustering images, regular SIFT features will not work too well. They are good for panorama construction and such things, where you expect to find the same keypoints across multiple images.

However, you can "abuse" SIFT keypoints; and that probably is what you are trying to reproduce. It would certainly help if you read the relevant articles, instead of just trying to figure it out yourself from a coding point of view.

A simple introduction can be found here: http://image-net.org/download-features

Notice how they sample the same number of keypoints from each image, by using a regular grid. Yet, they still do not put them together into one huge array - that doesn't work for similarity search. Instead they perform a kind of dimensionality reduction.

They run k-means on all of the individual keypoints of all images, to obtain 1000 "common" keypoints, called visual words. Then they translate each keypoint into the best matching visual word, and this way obtain a text-like representation of the image. Except that the keypoints don't have human-readable names, you can imagine representing the images as something like "sky sky sky sky fur fur fur forest sky fur sky fur water forest water water water forest" for an image with a beaver swimming in a lake.

On these bag of words representations you could then run clustering or similarity search again. K-means won't work well, because the vectors are sparse. Euclidean distance doesn't work well for sparse data, and k-means is unfortunately designed for Euclidean distance. Plus, the means are no longer sparse, and that makes them abnormal. Most likely, the resulting means will be more similar to each other than to the instances, taking the whole partitioning ad absurdum.

于 2012-11-16T22:08:39.023 回答
1

筛选点的数量随图像而变化,因此也没有固定长度的向量可以投影。此外,您连接它们以创建 1 个大向量的顺序意味着顺序很重要。SIFT 特征是一点(不是向量)

您需要更复杂的指标来定义相似性。欧几里得或其他基于矢量的指标将不起作用。OpenCV 的 K-means 需要向量输入,所以它不起作用。

于 2012-11-16T18:59:45.733 回答
0

SIFT 特征是一回事,每个特征的值构建向量。当您从 SIFT 关键点提取 SIFT 描述符时,您正在提取 4x4 区域中 8 个可能方向的梯度幅度。因此,对于每个关键点,您都有一个具有 128 个特征的向量,或者如果需要,还有 128 个维度。它与表示欧几里得空间中的一个点相同,但不是三个维度,而是 128。常规 SIFT 描述符的一个变体表现更好。密集筛选。它不是从高斯差中计算关键点,而是在图像顶部放置一个矩形网格来提取描述符。在谷歌中寻找 vl_feat。

于 2013-06-09T23:45:25.090 回答