algorithm - 找到10 ^ 5阶完整图的EMST的最简单，最简单的算法是什么

Question

我只是想明确一点，EMST 代表欧几里得最小生成树。

本质上，我得到了一个包含 100k 4D 顶点（每行一个顶点）的文件。目标是访问文件中的每个顶点，同时最小化总行程。从一个点到另一个点的距离就是欧几里得距离（如果你在两点之间画一条直线的距离”。

我已经知道这几乎是旅行商问题，它是 NP 完全问题，所以我正在寻找近似解决方案。

我想到的第一个近似算法是通过从文件构造的图中找到 MST ......但这将需要 O(N^2) 甚至只是从文件中构造所有边，因为它是完整的图表（我可以从任何一点到另一个点）。鉴于我的输入是 N = 10^5，我的算法将有一个巨大的运行时间，这太慢了......

关于如何计划近似解决方案的任何想法？非常感谢你..

score 2 · Accepted Answer

I know it's quadratic-time, but I think you should consider Prim with an implicit graph. The structure of the algorithm is

for each vertex v
    mindist[v] := infinity
    visited[v] := false
choose a root vertex r
mindist[r] := 0
repeat |V| times
    let w be the minimizer of d[w] such that not visited[w]
    visited[w] := true
    for each vertex v
        if not visited[v] and distance(w, v) < mindist[v]:
            mindist[v] := distance(w, v)
            parent[v] := w

Since the storage used is linear, it will likely stay resident in cache, and there are no fancy data structures, so this algorithm should run pretty fast.

score 1 · Accepted Answer

我将假设您实际上想要一个 EMST，正如您的标题所暗示的那样，TSP 只是达到此目的的一种手段，而不是实际目标本身。两者有非常不同的限制（TSP 的限制要大得多），因此有非常不同的最佳解决方案。

概述

这个想法是我们想要运行一个修改过的 kruskal 算法，它将利用 kd 树来找到最接近的对，而无需评估每个潜在的边缘。我们可以找到连接组件中每个顶点的最短边，取最短的整体，并通过该边连接我们的连接组件。正如您将看到的，每次迭代都会连接至少一半的连接组件，因此最多需要logn迭代才能完成。

合并

然而，当我们构造每条边时，我们希望在检查哪些边要保留以及哪些边连接已经连接的顶点之前将其插入到最小堆优先级队列中。这不会影响第一次迭代的结果，但稍后我们将需要通过增加距离来处理边缘。然后将所有边出列，通过不相交集检查不必要/冗余的边，将有效边插入 MST，并合并各自的不相交集。当然，所有这些都引入了一个nlogn从最小堆中构造和出列元素的因素（如果我们愿意，我们也可以将它们排序到一个普通的数组中）。

在添加边的第一次迭代之后，我们将连接至少一半的 MST，也许更多。这是因为我们为每个顶点添加了一条边，并且每条边最多可以有一个副本，所以我们添加了一些作为vertices / 2边，但多达vertices - 1. 现在，我们的 MST 至少已经建成了 1/2。我们将按照以下段落中的描述继续该过程，直到我们vertices - 1总共添加了边。

推广 NN 搜索

To continue, we'll want to construct lists of the vertices in each connected component, so that we can iterate over them by groups. This can be done in nearly linear time, as searching (also merging) a disjoint-set takes O(α(n)) time (α being the inverse ackermann function) and we repeat exactly n times. Once we have our lists of vertices per connected component, the rest is fairly straightforward. We'll take our existing k-d tree, and remove all the vertices in our current connected component. We'll then query for the nearest neighbor to each vertex to each vertex in our connected component, and add these edges to our min-heap. We'll then add these vertices back into the k-d tree, and repeat on the next connected component. Since we insert/remove a total of n elements, this amounts to an average case O(nlogn) time complexity.

Now that we have a queue of the shortest potential edges connecting our connected components, we'll dequeue these in order, and just as before insert valid edges and merge the disjoint sets. For the same reasons as before, this is guaranteed to connect at least half of our components, maybe even all of them. We'll repeat this process until we have connected all vertices into a single connected component, which will be our MST. Note that because we halve the number of disconnected components each iteration, it'll take at most O(logn) iterations to connect every vertex in our MST (most likely far less).

Remarks

Overall, this will take O(nlog^2(n)) time. There will likely be far less than log(n) iterations however, so expect a speedup there in practice. Also note that R-tree might be a good alternative to the k-d tree- I don't know how they compare in practice however.

algorithm - 找到10 ^ 5阶完整图的EMST的最简单，最简单的算法是什么

2 回答 2

概述

最近邻搜索

合并

推广 NN 搜索

Remarks

algorithm - 找到10 ^ 5阶完整图的EMST的最简单，最简单的算法是什么

2 回答 2

概述

最近邻搜索

合并

推广 NN 搜索

Remarks

Related

Reference