algorithm - 求 N^2 个元素的中位数（大尺度）

Question

问题是这样的：假设我们有 N 台机器，并且每台机器存储并可以操作它的 N 个元素，那么，我们如何以最低的成本找到所有 N^2 个元素的中位数？

真的很困扰，希望得到大家的解答，谢谢！

对不起，我写得太简单了。每台机器中存储的元素是随机的，没有顺序。而成本包含I/O成本，以及机器之间的通信、RAM、时间等一切都应该考虑在内。我只想找到获得中位数的最有效方法。

这些是我提出的一些解决方案：

使用外部排序，如合并排序或其他方法，并找到中位数。
使用桶排序，将所有元素按照其值分成X个连续的桶，这样我们就可以决定中位数在哪个桶中。扫描桶，我们将得到中位数。
我认为在“算法简介”中的 O(N) 算法中找到第 k 个数应该在这里工作吗？

但是，所有这些解决方案仍然需要一台额外的机器来完成这项工作。我想知道是否有一种方法可以只使用这 N 台机器来获得中位数？

谢谢！

score 0 · Accepted Answer

Can you estimate it rather than get it exactly?

If so, pick a constant K and fit a K-coefficient polynomial to the data on each machine, send the coefficients to a central machine that adds them and then finds the median by

Integrating the curve over the range to find the area under the curve
Doing a root-finding algorithm to find the point that splits the area in half.

The bigger K is, the less error there will be. The smaller K is, the more efficient it will be.

score 0 · Accepted Answer

您需要有一个计算所有值（所有商店的总数）的过程。选择中间索引。将索引调整为与相应机器上项目开头的偏移量。要求该机器对项目进行排序并返回该索引的值。

score 0 · Accepted Answer

Step 1: Sort the numbers at each machine individually
Step 2: Send the median at each machine to a central place
Step 3: Sort the medians and send it to each machine
Step 4: For each element in the sorted medians calculate the rank at machine level
Step 5: Calculate the rank of each element over all machines (just sum the rank)
Step 6: Find two elements in the sorted medians between which the global median exists
Step 7: For the next iteration consider only elements between those two medians 
        and repeat the whole thing again

在最坏的情况下，第二次迭代中的所有剩余元素都将位于一台机器上。

复杂性：很确定它是 O(nlogn) （即包括颚化它可以是 O(n^2logn)

algorithm - 求 N^2 个元素的中位数（大尺度）

3 回答 3

Related

Reference