algorithm - 寻找算法：通过“相似性”进行聚类

Question

我有一组“向量”，我需要根据它们的“相似性”对它们进行排序。

像这样：向量 {1,0,0} {1,1,0} {0,1,0} {1,0,1} 非常相似，最终应该彼此接近，但是向量 {1 , 0, 0} {8, 0, 0} {0, 5, 0} - 不是。

A 和 B 之间的度量是 max(abs(A[i]-B[i]))，但是什么样的算法可以根据相对比较对事物进行排序？

upd：输入：N 个向量的
数组输出：N 个向量的数组，其中最近的索引向量（例如 arr[i] arr[i+1]）是“相似”= arr[i] 和 arr[i+ 之间的度量1] 对于任何 i, j 都尽可能低。
公制 - 向量分量的最大差异

upd2：现在看来，@jogojapan 是对的-我需要对向量进行聚类，然后以某种线性顺序逐组打印它们

score 3 · Accepted Answer

这是由最大范数（又名 sup norm 或 l-infinity norm）引起的距离。如果通过排序意味着按顺序排序，则距离不足以创建线性排序。

score 2 · Accepted Answer

排序本质上是一个一维问题。您在此处描述的内容听起来更像是加权图，但尚不清楚您的目标是什么。如果您尝试识别与已知向量“最接近”的向量，您可能还会发现信息论中的一些概念（例如汉明距离）很有用。

score 0 · Accepted Answer

Well, the obvious approach would be the (IMHO badly named) "hierarchical clustering", which always merges those clusters with the smallest distance. You can plug in your metric there. Most implementations are in O(n^3) and thus not useful for large datasets. Plus, you get a huge dendrogram that is hard to read.

You might want to give OPTICS a try. Look it up on Wikipedia. It might satisfy your needs quite well, since it in fact sorts the points. It will walk from one cluster to another, and can in fact produce a hierarchical (as in "nested") clustering. A good implementation should run in O(n^2) without index structures and in O(n log n) with index acceleration.

score -1 · Accepted Answer

任何排序算法都可以给你想要的结果。

问题是你将如何比较你的向量。您只想按数量级比较它们吗？或者是其他东西？

algorithm - 寻找算法：通过“相似性”进行聚类

4 回答 4

Related

Reference