algorithm - Pairwise priority queue

Question

I have a set of A's and a set of B's, each with an associated numerical priority, where each A may match some or all B's and vice versa, and my main loop basically consists of:

Take the best A and B in priority order, and do stuff with A and B.

The most obvious way to do this is with a single priority queue of (A,B) pairs, but if there are 100,000 A's and 100,000 B's then the set of O(N^2) pairs won't fit in memory (and disk is too slow).

Another possibility is for each A, loop through every B. However this means that global priority ordering is by A only, and I really need to take priority of both components into account.

(The application is theorem proving, where the above options are called the pair algorithm and the given clause algorithm respectively; the shortcomings of each are known, but I haven't found any reference to a good solution.)

Some kind of two layer priority queue would seem indicated, but it's not clear how to do this without using either O(N^2) memory or O(N^2) time in the worst case.

Is there a known method of doing this?

Clarification: each A must be processed with all corresponding B's, not just one.

score 1 · Accepted Answer

你可以先处理最好的对，如果没有什么好的结果，为了完整起见，用给定的子句算法擦掉其余的。这可能会导致一些双重工作，但我敢打赌这无关紧要。

您是否考虑过有序的副调制或叠加？

score 1 · Accepted Answer

也许有什么我不明白，但是，

为什么不将 A 和 B 放在单独的堆中，在每个堆上获取_Max，做你的工作，从关联的堆中删除每个最大值并继续？

score 1 · Accepted Answer

似乎 A 中的项目具有单独的优先级，B 中的项目具有单独的优先级，并且 (A,B) 对具有组合优先级。只有组合的优先级很重要，但希望我们可以一路使用各个属性。但是，A 中的项目与 B 中的项目之间也存在独立优先级的匹配关系。

我假设，对于 A 中的所有 a、B 中的 b1 和 b2，使得 Match(a,b1) 和 Match(a,b2)，然后 Priority(b1) >= Priority(b2) 意味着 CombinedPriority(a,b1) >= 组合优先级（a，b2）。

现在，首先按优先级降序对 B 进行排序。令 B(j) 表示此排序顺序中的第 j 个元素。此外，让 A(i) 表示 A 的第 i 个元素（可能按或不按排序顺序）。

让 nextb(i,j) 是一个找到最小 j' >= j 的函数，使得 Match(A(i),B(j'))。如果不存在这样的 j'，则函数返回 null（或其他一些合适的错误值）。搜索 j' 可能只涉及从 j 向上循环，或者如果我们更多地了解 Match 关系的结构，我们可能能够更快地做一些事情。

为 A 中的所有索引 i 创建一个包含 (i,nextb(i,0)) 的优先级队列 Q，使得 nextb(i,0) != null。Q 中的对 (i,j) 由 CombinedPriority(A(i),B(j)) 排序。

现在只需循环直到 Q 为空。拉出最高优先级对 (i,j) 并适当处理 (A(i),B(j))。然后将 (i,nextb(i,j+1)) 重新插入到 Q 中（除非 nextb(i,j+1) 为空）。

总之，在所有配对都匹配的最坏情况下，这需要 O(N^2 log N) 时间。一般来说，它需要 O(N^2 + M log N)，其中 M 是匹配的数量。如果有一种更快的计算 nextb(i,j) 的方法只是向上循环，则可以减少 N^2 分量，但这取决于对匹配关系的了解。

（在上面的分析中，我假设 A 和 B 的大小都是 N。如果它们的大小不同，公式可以很容易地修改。）

在最坏的情况下，您似乎想要比 O(N^2) 时间更好的东西，但是如果您需要处理每场比赛，那么您就有一个 M 的下限，它可以是 N^2 本身。我认为你不会比 O(N^2 log N) 时间做得更好，除非组合优先级有一些特殊的结构可以让你使用比 log-N 更好的优先级队列。

score 0 · Accepted Answer

我认为您最初的想法会奏效，您只需将 As 和 B 保存在单独的集合中，并将对它们的引用粘贴到您的优先级队列中即可。如果每个引用占用 16 个字节（只是为了选择一个数字），那么 10,000,000 个 A/B 引用将只占用 ~300M。假设你的 As 和 Bs 本身不是太大，它应该是可行的。

score 0 · Accepted Answer

所以你有一组 A 和一组 B，你需要从这个组中选择一个 (A, B) 对，使得一些 f(a, b) 是任何其他 (A, B) 对中最高的.

这意味着您可以存储所有可能的 (A, B) 对并对其进行排序，然后每次通过循环选择最高的对（每次迭代 O(1)，但 O(N*M) 内存）。

或者您可以遍历所有可能的对并跟踪当前最大值并使用该最大值（每次迭代 O(N*M)，但只有 O(N+M) 内存）。

如果我对您的理解正确，这就是您要问的。

我认为这在很大程度上取决于 f() 来确定是否有更好的方法来做到这一点。

如果f(a, b) = a + b，那么显然很简单，最高的A，最高的B就是你想要的。

algorithm - Pairwise priority queue

5 回答 5

Related

Reference