6

给定一个rope,假设我们需要知道它的散列(通过一些散列函数传递所有叶子的连接)。

现在,当一个绳子叶发生变化时,重新计算整个绳子的哈希值的有效方法是什么?即像 O(log n) 而不是 O(n)。

一种方法是使用Merkle 树。但是,这会导致诸如...

  • 空的非叶节点或具有零长度子串的叶节点会影响哈希,即使它们对有效的绳索内容没有影响;
  • 将节点从子树的右侧移动到该子树右侧兄弟的左侧会影响最终哈希,但不会影响有效的绳索内容。

有没有更好的算法呢?散列函数不必是加密安全的,只要足以避免可能的冲突即可。

4

1 回答 1

3

Just as any node of a rope stores the size of the left subtree (or itself if it is a leaf), any node can additionally store the polynomial hash of the string corresponding to the left subtree (or itself if it is a leaf).

When weight is recalculated for a node, hash is also recalculated for that node, with the same asymptotic complexity.

For example, let the nodes and the values in them be:

    left     right    string     weight
1:                     abcd         4
2:    1        4                    4
3:                     ef           2
4:    3        5                    2
5:                     ghi          3

The polynomial hash is, with some fixed constants p and q:

h (s[0] s[1] ... s[n-1]) = (s[0] * p^(n-1) + s[1] * p^(n-2) + ... + s[n-1] * p^0) mod q.

So, we have the following hashes stored, all modulo q:

         hash
1:  a*p^3 + b*p^2 + c*p^1 + d*p^0
2:  a*p^3 + b*p^2 + c*p^1 + d*p^0
3:  e*p^1 + f*p^0
4:  e*p^1 + f*p^0
5:  g*p^2 + h*p^1 + i*p^0

A note about calculation modulo q. Here and below, all additions and multiplications are carried out modulo q. In other words, we operate in the ring of integers modulo q. We use the fact that

(a ? b) mod q = ((a mod q) ? (b mod q)) mod q

for the ? operation being addition, subtraction and multiplication. Thus, every time we do one of these operations, we immediately append a mod q to keep the numbers small. For example, if p and q are less than 230 = 1,073,741,824, addition and subtraction can be done in a 32-bit integer type, and multiplication will be fine with an intermediate 64-bit integer type. After each multiplication, we immediately take the result modulo q, making it fit into 32-bit integer again.


Now, how do we get the hash of the root - for example, to make it the left child of some node, or just to get the hash of the whole string?

We go from the root and to the right, and we have to add weights and merge hashes. Turns out we can just do (remember that everything is modulo q):

({a*p^3 + b*p^2 + c*p^1 + d*p^0} * p^2 + {e*p^1 + f*p^0}) * p^3 + {g*p^2 + h*p^1 + i*p^0}

The values in curly brackets are the values stored in out nodes. We recurse to the right. When getting up, we remember the weight collected so far, multiply the left-side hash by p to the power of that weight (that's where p^3 and p^(3+2=5) come from), and add the accumulated right-side hash.

The resulting value is equal to just the hash of the whole string:

a*p^8 + b*p^7 + c*p^6 + d*p^5 + e*p^4 + f*p^3 + g*p^2 + h*p^1 + i*p^0


A few notes here.

  1. We have to precalculate, maybe lazily, the powers of p modulo q to be able to multiply by them fast.

  2. The whole construction may become more clear if we store the hash of the whole subtree, not only of the left subtree, in a node. However, this way, we are likely to lose that O(1) concatenation possibility that the rope structure has, making it down to the usual O(log n), so that we may just have used a regular treap instead of a rope. Even if not, caching the hash value of the whole subtree in a node is definitely a possibility.

  3. If we reverse the order of powers in the hashing polynomial, making it
    h (s[0] s[1] ... s[n-1]) = (s[0] * p^0 + s[1] * p^1 + ... + s[n-1] * p^(n-1)) mod q,
    the math is similar, but collecting the hash from all the right descendants of a node can be done iteratively instead of recursively.

于 2017-02-08T12:07:00.480 回答