load-balancing - 一致散列与集合 (HRW) 散列 - 权衡是什么？

Question

网上有很多关于一致性哈希的资料，以及可用多种语言的实现。该主题的 Wikipedia 条目引用了另一个具有相同目标的算法：

集合散列

这个算法看起来更简单，并且不需要在环周围添加副本/虚拟来处理不均匀的加载问题。正如文章所提到的，它似乎在 O(n) 中运行，这对于大 n 来说是一个问题，但参考了一篇论文，指出它可以被构造为在 O(log n) 中运行。

对于在这方面有经验的人，我的问题是，为什么要选择一致的哈希而不是 HRW，或者相反？是否存在其中一种解决方案是更好选择的用例？

非常感谢。

score 18 · Accepted Answer

首先，我会说一致散列的优势在于热点。根据实现，可以手动修改令牌范围以处理它们。

使用 HRW，如果您以某种方式结束了热点（即由于哈希算法选择不佳），除了删除热点并添加一个应该平衡请求的新热点之外，您无能为力。

HRW 的一大优势是当您添加或删除节点时，您可以在所有内容中保持均匀分布。有了一致的哈希，他们通过给每个节点 200 个左右的虚拟节点来解决这个问题，这也使得手动管理范围变得困难。

score 9 · Accepted Answer

Speaking as someone who's just had to choose between the two approaches and who ultimately plumped for HRW hashing: My use case was a simple load balancing one with absolutely no reassignment requirement -- if a node died it's perfectly OK to just choose a new one and start again. No re balancing of existing data is required.

1) Consistent Hashing requires a persistent hashmap of the nodes and vnodes (or at least a sensible implementation does, you could build all the objects on every request.... but you really don't want to!). HWR does not (it's state-less). Nothing needs to be modified when machines join or leave the cluster - there is no concurrency to worry about (except that your clients have a good view of the state of the cluster which is the same in both cases)

2) HRW is easier to explain and understand (and the code is shorter). For example this is a complete HRW algorythm implemented in Riverbed Stingray TrafficScript. (Note there are better hash algorithms to choose than MD5 - it's overkill for this job)

$nodes = pool.listActiveNodes("stingray_test");

# Get the key
$key = http.getFormParam("param");

$biggest_hash = "";
$node_selected = "";

foreach ($node in $nodes) {
   $hash_comparator = string.hashMD5($node . '-' . $key);

   # If the combined hash is the biggest we've seen, we have a candidate
   if ( $hash_comparator > $biggest_hash ) {
      $biggest_hash = $hash_comparator;
      $node_selected = $node;
   }
}

connection.setPersistenceNode( $node_selected );

3) HRW provides an even distribution when you lose or gain nodes (assuming you chose a sensible hash function). Consistent Hashing doesn't guarantee that but with enough vnodes it's probably not going to be an issue

4) Consistent Routing may be faster - in normal operation it should be an order Log(N) where N is the number of nodes * the replica factor for vnodes. However, if you don't have a lot of nodes (I didn't) then HRW is going to be probably fast enough for you.

4.1) As you mentioned wikipedia mentions that there is a way to do HWR in log(N) time. I don't know how to do that! I'm happy with my O(N) time on 5 nodes.....

In the end, the simplicity and the stateless nature of HRW made the choice for me....

load-balancing - 一致散列与集合 (HRW) 散列 - 权衡是什么？

2 回答 2

Related

Reference