algorithm - 如何有效地从 Fenwick 树中找到一系列连续的已用/空闲槽

Question

假设我正在跟踪 Fenwick 树中插槽的使用情况。例如，让我们考虑跟踪 32 个插槽，导致如下图所示的 Fenwick 树布局，其中网格中的数字表示底层数组中的索引，其中每个单元格中的值是由 Fenwick 树操作的计数该段中“已使用”项目的总和（即数组单元格 23 存储范围 [16-23] 中已使用插槽的数量）。最低级别的项目（即单元格 0、2、4、...）只能具有“1”（已用槽）或“0”（空闲槽）的值。

芬威克树布局示例

我正在寻找的是一种有效的算法来找到给定数量的连续空闲插槽的第一个范围。

为了说明，假设我有如下图所示的 Fenwick 树，其中总共使用了 9 个插槽（请注意，为了清楚起见，添加了浅灰色数字，实际上并未存储在树的数组单元中）。

示例树

现在我想找到例如 10 个空闲插槽的第一个连续范围，它应该找到这个范围：

示例搜索结果

我似乎找不到一种有效的方法来做到这一点，这让我有点头疼。请注意，由于所需的存储空间量对我的目的至关重要，我不希望将设计扩展为段树。

任何关于 O(log N) 类型解决方案的想法和建议都将受到欢迎。

编辑

赏金期到期后的更新时间。感谢所有评论、问题、建议和答案。他们让我重新思考问题，教会了我很多东西，并向我指出（再一次；有一天我可能会学到这一课），我应该在提问时更多地关注我想要解决的问题。

由于@Erik P是唯一对包含请求代码/伪代码的问题提供合理答案的人，因此他将获得赏金。

他还正确地指出，使用这种结构进行 O(log N) 搜索是不可能的。感谢@DanBjorge提供的证明，让我想到了最坏情况下的性能。

@EvgenyKluev的评论和回答让我意识到我应该以不同的方式提出我的问题。事实上，我已经在很大程度上按照他的建议做了（参见https://gist.github.com/anonymous/7594508 - 它显示了我在发布这个问题之前遇到的问题），并问了这个问题，希望有一个有效的方法来搜索连续范围，从而防止将此设计更改为段树（这将需要额外的 1024 字节）。然而，这样的改变似乎是明智之举。

对于任何感兴趣的人，可以在此处找到与此问题中使用的示例匹配的二进制编码的 Fenwick 树（以 64 位编码的 32 slot fenwick 树）：https ://gist.github.com/anonymous/7594245 。

score 3 · Accepted Answer

As mcdowella suggests in their answer, let K2 = K/2, rounding up, and let M be the smallest power of 2 that is >= K2. A promising approach would be to search for contiguous blocks of K2 zeroes fully contained in one size-M block, and once we've found those, check neighbouring size-M blocks to see if they contain sufficient adjacent zeroes. For the initial scan, if the number of 0s in a block is < K2, clearly we can skip it, and if the number of 0s is >= K2 and the size of the block is >= 2*M, we can look at both sub-blocks.

This suggests the following code. Below, A[0 .. N-1] is the Fenwick tree array; N is assumed to be a power of 2. I'm assuming that you're counting empty slots, rather than nonempty ones; if you prefer to count empty slots, it's easy enough to transform from the one to the other.

initialize q as a stack data structure of triples of integers
push (N-1, N, A[n-1]) onto q
# An entry (i, j, z) represents the block [i-j+1 .. i] of length j, which
# contains z zeroes; we start with one block representing the whole array.
# We maintain the invariant that i always has at least as many trailing ones
# in its binary representation as j has trailing zeroes. (**)
initialize r as an empty list of pairs of integers
while q is not empty:
    pop an entry (i,j,z) off q
    if z < K2:
        next

    if FW(i) >= K:
        first_half := i - j/2
        # change this if you want to count nonempty slots:
        first_half_zeroes := A[first_half]
        # Because of invariant (**) above, first_half always has exactly
        # the right number of trailing 1 bits in its binary representation
        # that A[first_half] counts elements of the interval
        # [i-j+1 .. first_half].

        push (i, j/2, z - first_half_zeroes) onto q
        push (first_half, j/2, first_half_zeroes) onto q
    else:
        process_block(i, j, z)

This lets us process all size-M blocks with at least K/2 zeroes in order. You could even randomize the order in which you push the first and second half onto q in order to get the blocks in a random order, which might be nice to combat the situation where the first half of your array fills up much more quickly than the latter half.

Now we need to discuss how to process a single block. If z = j, then the block is entirely filled with 0s and we can look both left and right to add zeroes. Otherwise, we need to find out if it starts with >= K/2 contiguous zeroes, and if so with how many exactly, and then check if the previous block ends with a suitable number of zeroes. Similarly, we check if the block ends with >= K/2 contiguous zeroes, and if so with how many exactly, and then check if the next block starts with a suitable number of zeroes. So we will need a procedure to find the number of zeroes a block starts or ends with, possibly with a shortcut if it's at least a or at most b. To be precise: let ends_with_zeroes(i, j, min, max) be a procedure that returns the number of zeroes that the block from [i-j+1 .. j] ends with, with a shortcut to return max if the result will be more than max and min if the result will be less than min. Similarly for starts_with_zeroes(i, j, min, max).

def process_block(i, j, z):
    if j == z:
        if i > j:
            a := ends_with_zeroes(i-j, j, 0, K-z)
        else:
            a := 0

        if i < N-1:
            b := starts_with_zeroes(i+j, j, K-z-a-1, K-z-a)
        else:
            b := 0

        if b >= K-z-a:
            print "Found: starting at ", i - j - a + 1
        return

    # If the block doesn't start or end with K2 zeroes but overlaps with a
    # correct solution anyway, we don't need to find it here -- we'll find it
    # starting from the adjacent block.
    a := starts_with_zeroes(i, j, K2-1, j)
    if i > j and a >= K2:
        b := ends_with_zeroes(i-j, j, K-a-1, K-a)
        if b >= K-a:
            print "Found: starting at ", i - j - a + 1
        # Since z < 2*K2, and j != z, we know this block doesn't end with K2
        # zeroes, so we can safely return.
        return

    a := ends_with_zeroes(i, j, K2-1, j)
    if i < N-1 and a >= K2:
        b := starts_with_zeroes(i+j, K-a-1, K-a)
        if b >= K-a:
            print "Found: starting at ", i - a + 1

Note that in the second case where we find a solution, it may be possible to move the starting point left a bit further. You could check for that separately if you need the very first position that it could start.

Now all that's left is to implement starts_with_zeroes and ends_with_zeroes. In order to check that the block starts with at least min zeroes, we can test that it starts with 2^h zeroes (where 2^h <= min) by checking the appropriate Fenwick entry; then similarly check if it starts with 2^H zeroes where 2^H >= max to short cut the other way (except if max = j, it is trickier to find the right count from the Fenwick tree); then find the precise number.

def starts_with_zeroes(i, j, min, max):
    start := i-j

    h2 := 1
    while h2 * 2 <= min:
        h2 := h2 * 2
        if A[start + h2] < h2:
            return min
    # Now h2 = 2^h in the text.
    # If you insist, you can do the above operation faster with bit twiddling
    # to get the 2log of min (in which case, for more info google it).

    while h2 < max and A[start + 2*h2] == 2*h2:
        h2 := 2*h2
    if h2 == j:
        # Walk up the Fenwick tree to determine the exact number of zeroes
        # in interval [start+1 .. i]. (Not implemented, but easy.) Let this
        # number be z.

        if z < j:
            h2 := h2 / 2

    if h2 >= max:
        return max

    # Now we know that [start+1 .. start+h2] is all zeroes, but somewhere in 
    # [start+h2+1 .. start+2*h2] there is a one.
    # Maintain invariant: the interval [start+1 .. start+h2] is all zeroes,
    # and there is a one in [start+h2+1 .. start+h2+step].
    step := h2;
    while step > 1:
        step := step / 2
        if A[start + h2 + step] == step:
             h2 := h2 + step
    return h2

As you see, starts_with_zeroes is pretty bottom-up. For ends_with_zeroes, I think you'd want to do a more top-down approach, since examining the second half of something in a Fenwick tree is a little trickier. You should be able to do a similar type of binary search-style iteration.

This algorithm is definitely not O(log(N)), and I have a hunch that this is unavoidable. The Fenwick tree simply doesn't give information that is that good for your question. However, I think this algorithm will perform fairly well in practice if suitable intervals are fairly common.

score 3 · Accepted Answer

我认为以 O(log N) 时间复杂度实现所有所需功能并同时最小化内存需求的最简单方法是使用位向量来存储所有 0/1（空闲/使用）值。位向量可以替代 Fenwick 树和段树的 6 个最低级别（如果实现为 64 位整数）。因此，这些树的高度可能会减少 6 倍，并且每棵树的空间需求将比平时少 64 倍（或 32 倍）。

段树可以实现为位于数组中的隐式二叉树（就像众所周知的最大堆实现一样）。索引 1 处的根节点，索引处节点的每个左后代i放置在处2*i，每个右后代放置在处2*i+1。这意味着与 Fenwick 树相比，需要两倍的空间，但由于树的高度减少了 6 级，这不是一个大问题。

每个段树节点应该存储一个值——从该节点覆盖的点开始的最长连续“空闲”槽序列的长度（如果没有这样的起点，则为零）。这使得搜索给定数量的连续零的第一个范围非常简单：从根开始，如果它包含大于或等于所需的值，则选择左后代，否则选择右后代。到达某个叶节点后，检查位向量的相应字（对于字中间的一系列零）。

更新操作更复杂。将值更改为“已使用”时，检查位向量的适当字，如果为空，则上升段树以找到某个左后代的非零值，然后将树下降到具有该值的最右边的叶子，然后确定如何新添加的插槽将“空闲”间隔分成两半，然后更新添加的插槽和被分割间隔的起始节点的所有父节点，还在位向量中设置一个位。可以类似地实现将值更改为“免费”。

如果还需要获取某个范围内的非零项的数量，请在相同的位向量上实现 Fenwick 树（但与段树分开）。Fenwick 树的实现没有什么特别之处，除了将 6 个最低节点加在一起被位向量的某些字的“人口计数”操作所取代。有关将 Fenwick 树与位向量一起使用的示例，请参阅CodeChef 上Magic Board的第一个解决方案。

使用各种按位技巧可以非常有效地实现位向量的所有必要操作。对于其中一些（前导/尾随零计数和人口计数），您可以使用编译器内在函数或汇编指令（取决于目标体系结构）。

如果位向量是用 64 位字和树节点实现的——用 32 位字，除了位向量之外，两个树都占用 150% 的空间。如果每个叶节点不对应于单个位向量字，而是对应于小范围（4 或 8 个字），这可能会显着减少。对于 8 个字，树所需的额外空间仅为位向量大小的 20%。这使得实现稍微复杂一些。如果优化得当，每个叶节点一个词的性能应该与变体中的大致相同。对于非常大的数据集，性能可能会更好（因为位向量计算比遍历树对缓存更友好）。

score 2 · Accepted Answer

@Erik 介绍了一个合理的发声算法。但是，请注意，在最坏的情况下，这个问题的复杂度较低，为 Ω(N/K)。

证明：

考虑问题的简化版本，其中：

N 和 K 都是 2 的幂
N > 2K >= 4

假设您的输入数组由 (N/2K) 个大小为 2K 的块组成。一个块的形式是 K 0s 后跟 K 1s，每隔一个块是重复 K 次的字符串“10”。有 (N/2K) 个这样的数组，每个数组都有一个问题的解决方案（一个“特殊”块的开头）。

设 n = log2(N)，k = log2(K)。让我们还将树的根节点定义为级别 0，叶节点定义为树的级别 n。

请注意，由于我们的数组由大小为 2K 的对齐块组成，因此树的第 nk 层将简单地由每个块中 1 的数量组成。但是，我们的每个块中都有相同数量的 1。这意味着 nk 层的每个节点都是相同的，这反过来意味着 <= nk 层的每个节点也将是相同的。

这意味着在您开始分析级别 n-k+1 及更低级别之前，树不包含可以消除“特殊”块歧义的信息。但是由于该级别的 (N/K) 个节点中除了 2 个之外的所有节点都是相同的，这意味着在最坏的情况下，您必须检查 O(N/K) 个节点才能将解决方案与其余节点区分开来节点。

score 2 · Accepted Answer

在搜索 K 个连续槽的范围时，一个快速检查是找到小于或等于 K/2 的 2 的最大幂。任何 K 个连续零槽必须包含至少一个 Fenwick 对齐的槽，其大小 <= K/2，完全用零填充。您可以从顶部搜索 Fenwick 树以查找此类对齐的零块，然后查找第一个可以扩展以生成 K 个连续零范围的树。

在您的示例中，最低级别包含 0 或 1，而较高级别包含后代的总和。如果最低级别包含您当前正在写入 1 的 0 以及您当前正在写入零的左侧的连续零数量的计数，并且较高级别包含任何后代的最大值，则查找 0 的延伸会更容易。更新将意味着更多的工作，特别是如果您创建和销毁长的零字符串，但您可以找到长度至少为 K 的最左边的零字符串，只需一次搜索即可找到最大值至少为 K 的左侧分支. 实际上，这里的很多更新工作都是在最低级别上创建和销毁 1,2,3,4... 的运行。

algorithm - 如何有效地从 Fenwick 树中找到一系列连续的已用/空闲槽

4 回答 4

证明：

Related

Reference