hash - 如何可靠地跨多个服务器分片数据

Question

我目前正在阅读一些分布式系统设计模式。当您必须处理大量数据（数十亿个整体或多个 PB 字节）时，其中一种设计模式是将其分散到多个服务器或存储单元中。

解决方案之一是使用一致哈希。这应该会导致散列中的所有服务器均匀分布。

这个概念相当简单：我们可以只添加新服务器，并且只有范围内的服务器会受到影响，如果您丢失服务器，一致性哈希中的剩余服务器将接管。这是哈希中的所有服务器都具有相同数据（在内存、磁盘或数据库中）的情况。

我的问题是我们如何处理从一致的哈希中添加和删除服务器，其中有太多的数据无法存储在单个主机上。他们如何确定要存储哪些数据以及不存储哪些数据？

例子：

假设我们有 2 台机器正在运行，“0”和“1”。它们开始达到最大容量的 60%，因此我们决定增加一台机器“2”。现在机器 0 上的大部分数据必须迁移到机器 2。我们将如何实现自动化，以便在不停机且可靠的情况下实现这一点。

我自己建议的方法将是具有一致哈希的服务和机器将知道如何在彼此之间传输数据。添加新机器时，一致性哈希服务是否会计算受影响的哈希范围。然后通知受影响的机器受影响的哈希范围，并且他们需要将受影响的数据传输到机器 2。一旦受影响的机器完成数据传输，他们将 ACK 回一致的哈希服务。一旦所有受影响的服务完成传输数据，一致性哈希服务将开始向机器 2 发送数据，并通知受影响的机器他们现在可以删除传输的数据。如果我们在每台服务器上都有 peta 字节，这个过程可能需要很长时间。我们在那里需要跟踪在传输过程中发生了哪些变化，这样我们就可以确保在之后同步它们，

我的方法会奏效，但我觉得来来回回有点冒险，所以我想听听是否有更好的方法。

score 1 · Accepted Answer

我们将如何实现自动化，以便在不停机且可靠的情况下实现这一点？

这取决于用于存储数据的技术，但例如在Cassandra中，没有管理流程的“中央”实体，它的完成方式几乎与其他所有事情一样；通过让节点相互闲聊。新节点加入集群时没有停机时间（但性能可能会受到轻微影响）。

过程如下：

The new node joining the cluster is defined as an empty node without system tables or data.

When a new node joins the cluster using the auto bootstrap feature, it will perform the following operations

- Contact the seed nodes to learn about gossip state.
- Transition to Up and Joining state (to indicate it is joining the cluster; represented by UJ in the nodetool status).
- Contact the seed nodes to ensure schema agreement.
- Calculate the tokens that it will become responsible for.
- Stream replica data associated with the tokens it is responsible for from the former owners.
- Transition to Up and Normal state once streaming is complete (to indicate it is now part of the cluster; represented by UN in the nodetool status).

取自https://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html

因此，当加入节点处于 Joining 状态时，它正在从其他节点接收数据但尚未准备好读取，直到该过程完成（Up 状态）。

DataStax 对此也有一些材料https://academy.datastax.com/units/2017-ring-dse-foundations-apache-cassandra?path=developer&resource=ds201-datastax-enterprise-6-foundations-of-apache-cassandra

hash - 如何可靠地跨多个服务器分片数据

1 回答 1

Related

Reference