algorithm - 在 1-NN 图中找到连通分量的快速方法？

Question

首先，我得到了一个 N*N 距离矩阵，对于每个点，我计算了它的最近邻，所以我们有一个 N*2 矩阵，看起来像这样：

0 -> 1  
1 -> 2  
2 -> 3  
3 -> 2  
4 -> 2  
5 -> 6  
6 -> 7  
7 -> 6  
8 -> 6  
9 -> 8

第二列是最近邻居的索引。所以这是一种特殊的有向图，每个顶点只有一个出度。

当然，我们可以先将 N*2 矩阵转换为标准的图表示，然后执行 BFS/DFS 得到连通分量。

但是，鉴于这个特殊图表的特点，还有其他快速的方法来完成这项工作吗？

我将不胜感激。

更新：

我在这里为这种情况实现了一个简单的算法。

看，我没有使用联合查找算法，因为数据结构可能会让事情变得不那么容易，而且我怀疑这是否是我案例中最快的方法（我的意思是实际上）。

您可能会争辩说 _merge 过程可能很耗时，但如果我们在分配新标签的同时将边缘交换到连续位置，合并可能会花费很少，但它需要另外 N 个空间来跟踪原始索引。

score 5 · Accepted Answer

给定边列表查找连通分量的最快算法是union-find算法：对于每个节点，如果您找到长度为至少2，向上重新连接底部节点。

这肯定会在线性时间内运行：

- push all edges into a union-find structure: O(n)
- store each node in its set (the union-find root)
    and update the set of non-empty sets: O(n)
- return the set of non-empty sets (graph components).

由于边列表几乎已经形成了一个联合查找树，因此可以跳过第一步：

for each node
- if the node is not marked as collected
-- walk along the edges until you find an order-1 or order-2 loop, 
   collecting nodes en-route
-- reconnect all nodes to the end of the path and consider it a root for the set.
-- store all nodes in the set for the root.
-- update the set of non-empty sets.
-- mark all nodes as collected.
return the set of non-empty sets

第二种算法也是线性的，但只有一个基准才能判断它是否真的更快。union-find 算法的优势在于它的优化。这将优化延迟到第二步，但完全删除了第一步。

如果你加入最近邻计算的联合步骤，你可能会挤出更多的性能，然后在第二遍收集集合。

score 1 · Accepted Answer

由于每个节点只有一个出边，您可以一次只遍历一个边，直到到达您已经访问过的顶点。出度为 1 意味着此时任何进一步的遍历只会将您带到您已经去过的地方。该路径中遍历的顶点都在同一个组件中。

在您的示例中：

0->1->2->3->2, so [0,1,2,3] is a component
4->2, so update the component to [0,1,2,3,4]
5->6->7->6, so [5,6,7] is a component
8->6, so update the compoent to [5,6,7,8]
9->8, so update the compoent to [5,6,7,8,9]

您可以只访问每个节点一次，因此时间为 O(n)。空间是 O(n)，因为您只需要每个节点的组件 ID 和组件 ID 列表。

score 1 · Accepted Answer

如果你想按顺序进行，你可以使用加权快速联合和路径压缩来完成。复杂度 O(N+Mlog(log(N)))。检查这个链接。这是伪代码 .honoring @pycho 的话

`

 public class QuickUnion
    {
    private int[] id;

    public QuickUnion(int N)
    {
         id = new int[N];
         for (int i = 0; i < N; i++) id[i] = i;
    }

    public int root(int i)
    {
         while (i != id[i])
         {
             id[i] = id[id[i]];
             i = id[i];
         }
         return i;
    }

    public boolean find(int p, int q)
    {
        return root(p) == root(q);
    }

    public void unite(int p, int q)
    {
        int i = root(p);
        int j = root(q);
        id[i] = j;
    }

    }

` @reference https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf

如果要并行查找连接的组件，可以使用指针跳转和带路径压缩的加权快速联合将渐近复杂度降低到 O(log(log(N)) 时间。检查此链接

https://vishwasshanbhog.wordpress.com/2016/05/04/efficient-parallel-algorithm-to-find-the-connected-components-of-the-graphs/

algorithm - 在 1-NN 图中找到连通分量的快速方法？

3 回答 3

Related

Reference