2

我有一组 2D 坐标集(在每组 100K-500K 点的范围内),我正在寻找最有效的方法来测量一组与另一组的相似性。我知道通常情况:余弦、Jaccard/Tanimoto 等。但是我希望对任何快速/有效的测量相似度的方法提出一些建议,尤其是那些可以通过相似度聚类的方法。

编辑 1:图像显示了我需要做的事情。我需要通过它们的形状/方向等来聚集所有的红色、蓝色和绿色。

替代文字 http://img402.imageshack.us/img402/8121/curves.png

4

3 回答 3

0

Since your clustering is based on a nearness-to-shape metric, perhaps you need some form of connected component labeling. UNION-FIND can give you a fast basic set primitive.

For union-only, start every point in a different set, and merge them if they meet some criterion of nearness, influenced by local colinearity since that seems important to you. Then keep merging until you pass some over-threshold condition for how difficult your merge is. If you treat it like line-growing (only join things at their ends) then some data structures become simpler. Are all your clusters open lines and curves? No closed curves, like circles?

The crossing lines are trickier to get right, you either have to find some way merge then split, or you set your merge criteria to extremely favor colinearity and you luck out on the crossing lines.

于 2010-02-05T21:58:25.670 回答
0

尝试 K-means 算法。它动态计算每个簇的质心并计算到所有指针的距离并将它们与最近的簇相关联。

于 2010-01-20T15:37:38.523 回答
0

似乎任何解决方案的第一步都是找到每个形状的质心或其他参考点,以便无论绝对位置如何都可以比较它们。

想到的一种算法是从最接近质心的点开始,然后走到最近的邻居。在被比较的集合之间比较这些邻居(从质心)的偏移量。继续走到质心的下一个最近的邻居,或之前比较的最近的尚未比较的邻居,并跟踪两个形状之间的聚合差异(可能是 RMS?)。此外,在此过程的每个步骤中,计算将使两个形状最接近对齐的旋转偏移[以及镜像是否也会影响它?]。完成后,您将为每对集合获得三个值,包括它们的直接相似性、它们的相对旋转偏移量(通常仅在它们在旋转后非常接近的情况下才有用)以及它们在旋转后的相似性。

于 2010-01-29T05:27:37.827 回答