对于许多问题,我认为推荐的解决方案是使用联合查找数据结构。我试图阅读它并思考它是如何实现的(使用 C++)。我目前的理解是,它只是一个集合列表。所以要找到一个元素属于哪个集合,我们需要n*log n
操作。而当我们必须执行并集时,我们必须找到需要合并的两个集合并对其进行操作set_union
。这对我来说看起来不是很有效。我对这个数据结构的理解是正确的还是我遗漏了什么?
4 回答
这是一个很晚的回复,但这可能在stackoverflow的其他地方没有得到回答,并且由于这是搜索联合查找的人的最顶部页面,这里是详细的解决方案。
Find-Union 是一种非常快速的操作,在几乎恒定的时间内执行。它遵循 Jeremie 对路径压缩和跟踪集大小的见解。对每个查找操作本身执行路径压缩,从而花费摊销的 lg*(n) 时间。lg* 就像逆阿克曼函数,增长非常缓慢,很少超过 5(至少直到 n< 2^65535)。联合/合并集是惰性执行的,只需将 1 个根指向另一个,特别是较小集的根指向较大集的根,这是在恒定时间内完成的。
请参阅https://github.com/kartikkukreja/blog-codes/blob/master/src/Union%20Find%20%28Disjoint%20Set%29%20Data%20Structure.cpp中的以下代码
class UF {
int *id, cnt, *sz;
public:
// Create an empty union find data structure with N isolated sets.
UF(int N) {
cnt = N; id = new int[N]; sz = new int[N];
for (int i = 0; i<N; i++) id[i] = i, sz[i] = 1; }
~UF() { delete[] id; delete[] sz; }
// Return the id of component corresponding to object p.
int find(int p) {
int root = p;
while (root != id[root]) root = id[root];
while (p != root) { int newp = id[p]; id[p] = root; p = newp; }
return root;
}
// Replace sets containing x and y with their union.
void merge(int x, int y) {
int i = find(x); int j = find(y); if (i == j) return;
// make smaller root point to larger one
if (sz[i] < sz[j]) { id[i] = j, sz[j] += sz[i]; }
else { id[j] = i, sz[i] += sz[j]; }
cnt--;
}
// Are objects x and y in the same set?
bool connected(int x, int y) { return find(x) == find(y); }
// Return the number of disjoint sets.
int count() { return cnt; }
};
The data structure can be represented as a tree, with branches reversed (instead of pointing down, the branches point upwards to the parent---and link a child with its parent).
If I remember correctly, it can be shown (easily):
that path compression (whenever you do a lookup for the "parent" of a set A, you "compress" the path so that each future call to these will provide the parent in time O(1)) will lead to O(log n) complexity per call;
that balancing (you keep approximately track of the number of children each set has, and when you have to "unite" two sets, you make the one with the fewer children child of the one with the most) also leads to a O(log n) complexity per call.
A more involved proof can show that when you combine both optimizations, you obtain an average complexity that is the inverse Ackermann function, written α(n), and this was Tarjan's main invention for this structure.
It was later shown, I believe, that for some specific usage patterns, this complexity is actually constant (though for all practical purpose inverse of ackermann is about 4). According to the Wikipedia page on Union-Find, in 1989, the amortized cost per operation of any equivalent data structure was shown to be Ω(α(n)), proving that the current implementation is asymptotically optimal.
正确的联合查找数据结构在每次查找期间都使用路径压缩。这摊销了成本,然后每个操作与阿克曼函数的倒数成正比,这基本上使其恒定(但不完全)。
如果您是从头开始实现它,那么我建议您使用基于树的方法。
一个简单的 union-set 结构保持一个数组(元素 -> 集合),使得查找哪个集合成为常数时间;更新它们是摊销 log n 时间并且连接列表是恒定的。不如上面的一些方法那么快,但编程起来很简单,而且足以改善比如 Kruskal 的最小生成树算法的 Big-O 运行时间。