data-structures - 批量更新二进制堆中的节点优先级？

Question

我发布了相当混乱的问题，所以我从头开始重写......

这实际上是纯粹的理论问题。

说，我们有二进制堆。让堆为 MaxHeap，因此根节点的值最大，每个节点的值都大于其子节点。我们可以在这个堆上做一些常见的低级操作：“交换两个节点”、“比较两个节点”。

使用这些低级操作，我们可以实现通常的高级递归操作：“sift-up”、“sift-down”。

使用这些 sift-up 和 sift-downs，我们可以实现“插入”、“修复”和“更新”。我对“更新”功能感兴趣。假设我已经有了要更改的节点位置。因此，更新函数非常简单：

function update (node_position, new_value){
    heap[node_position] = new_value;
    sift_up(node_position);
    sift_down(node_position);
}

我的问题是：（数学上）是否有可能制作更高级的“更新”功能，可以一次更新更多节点，在某种程度上，所有节点都将它们的值更改为 new_values，然后，它们的位置被纠正？像这样的东西：

function double_update (node1_pos, node2_pos, node1_newVal, node2_newVal){
    heap[node1_pos] = node1_newVal;
    heap[node2_pos] = node2_newVal;
    sift_up(node1_position);
    sift_down(node1_position);
    sift_up(node2_position);
    sift_down(node2_position);
}

我用这个“double_update”做了一些测试，它工作了，虽然它没有证明什么。

“三重更新”呢，等等……

我用“多重更新”做了一些其他测试，我改变了所有节点的值，然后调用 { sift-up(); 筛选（）；} 以随机顺序对它们中的每一个执行一次。这不起作用，但结果离正确不远。

我知道这听起来没什么用，但我对它背后的理论很感兴趣。如果我让它发挥作用，我实际上确实有一个用途。

score 1 · Accepted Answer

Here's a trick and possibly funky algorithm, for some definition of funky:

(Lots of stuff left out, just to give the idea):

template<typename T> class pseudoHeap {
  private:
    using iterator = typename vector<T>::iterator;
    iterator max_node;
    vector<T> heap;
    bool heapified;

    void find_max() {
      max_node = std::max_element(heap.begin(), heap.end());
    }

  public:
    void update(iterator node, T new_val) {
      if (node == max_node) {
          if (new_val < *max_node) {
               heapified = false;
               *max_node = new_val;
               find_max();
           } else {
               *max_node = new_val;
           }
      } else {
          if (new_val > *max_node) max_node = new_val;
          *node = new_val;
          heapified = false;
     }

    T& front() { return &*max_node; }

    void pop_front() {
        if (!heapified) {
            std::iter_swap(vector.end() - 1, max_node);
            std::make_heap(vector.begin(), vector.end() - 1);
            heapified = true;
        } else {
            std::pop_heap(vector.begin(), vector.end());
        }
    }        
};

Keeping a heap is expensive. If you do n updates before you start popping the heap, you've done the same amount of work as just sorting the vector when you need it to be sorted (O(n log n)). If it's useful to know what the maximum value is all the time, then there is some reason to keep a heap, but if the maximum value is no more likely to be modified than any other value, then you can keep the maximum value always handy at amortized cost O(1) (that is, 1/n times it costs O(n) and the rest of the time it's O(1). That's what the above code does, but it might be even better to be lazy about computing the max as well, making front() amortized O(1) instead of constant O(1). Depends on your requirements.

As yet another alternative, if the modifications normally don't cause the values to move very far, just do a simple "find the new home and rotate the subvector" loop, which although it's O(n) instead of O(log n), is still faster on short moves because the constant is smaller.

In other words, don't use priority heaps unless you're constantly required to find the top k values. When there are lots of modifications between reads, there is usually a better approach.

score 1 · Accepted Answer

这样做绝对是可能的，但是如果您打算更改二进制堆中的大量键，您可能需要查看其他堆结构，例如 Fibonacci 堆或配对堆，它们可以比二进制堆。在具有 n 个节点的二进制堆中更改 k 个键需要 O(k log n) 时间，而在 Fibonacci 堆中需要 O(k) 时间。这是渐近最优的，因为你甚至不能在不做至少 Ω(k) 工作的情况下接触 k 个节点。

要考虑的另一件事是，如果您一次更改多个 Ω(n / log n) 键，您将至少做 Ω(n) 个工作。在这种情况下，通过使用标准 heapify 算法在 Θ(n) 时间内从头开始重建堆来实现更新可能更快。

希望这可以帮助！

data-structures - 批量更新二进制堆中的节点优先级？

2 回答 2

Related

Reference