mongodb - 如何使用 MongoDB 以编程方式预拆分基于 GUID 的分片键

Question

假设我使用的是相当标准的 32 字符十六进制GUID，并且我已经确定，因为它是为我的用户随机生成的，所以它非常适合用作水平扩展写入我将存储的 MongoDB 集合的分片键中的用户信息（和写缩放是我最关心的）。

我也知道我需要从至少 4 个分片开始，因为流量预测和在测试环境中完成的一些基准测试工作。

最后，我对我的初始数据大小（平均文档大小 * 初始用户数）有了一个不错的了解——大约为 120GB。

我想让初始加载变得又快又好，并尽可能多地利用所有 4 个分片。如何预先拆分这些数据，以便利用 4 个分片并最大限度地减少初始数据加载期间需要在分片上发生的移动、拆分等次数？

score 30 · Accepted Answer

我们知道初始数据大小（120GB），并且我们知道 MongoDB 中默认的最大块大小是 64MB。如果我们将 64MB 分成 120GB，我们得到 1920 - 所以这是我们应该开始寻找的最小块数。碰巧 2048 恰好是 16 除以 2 的幂，并且鉴于 GUID（我们的分片键）是基于十六进制的，这比 1920 更容易处理（见下文）。

注意：必须在将任何数据添加到集合之前完成此预拆分。如果您在包含数据的集合上使用 enableSharding() 命令，MongoDB 将自行拆分数据，然后您将在块已经存在时运行此命令 - 这可能导致非常奇怪的块分布，所以要小心。

出于此答案的目的，我们假设将调用数据库并且调用users集合userInfo。我们还假设 GUID 将被写入该_id字段。使用这些参数，我们将连接到 amongos并运行以下命令：

// first switch to the users DB
use users;
// now enable sharding for the users DB
sh.enableSharding("users"); 
// enable sharding on the relevant collection
sh.shardCollection("users.userInfo", {"_id" : 1});
// finally, disable the balancer (see below for options on a per-collection basis)
// this prevents migrations from kicking off and interfering with the splits by competing for meta data locks
sh.stopBalancer();

现在，根据上面的计算，我们需要将 GUID 范围分成 2048 个块。为此，我们至少需要 3 个十六进制数字 (16 ^ 3 = 4096)，我们将把它们放在范围的最高有效数字（即最左边的 3 个）中。同样，这应该从mongosshell运行

// Simply use a for loop for each digit
for ( var x=0; x < 16; x++ ){
  for( var y=0; y<16; y++ ) {
  // for the innermost loop we will increment by 2 to get 2048 total iterations
  // make this z++ for 4096 - that would give ~30MB chunks based on the original figures
    for ( var z=0; z<16; z+=2 ) {
    // now construct the GUID with zeroes for padding - handily the toString method takes an argument to specify the base
        var prefix = "" + x.toString(16) + y.toString(16) + z.toString(16) + "00000000000000000000000000000";
        // finally, use the split command to create the appropriate chunk
        db.adminCommand( { split : "users.userInfo" , middle : { _id : prefix } } );
    }
  }
}

完成后，让我们使用sh.status()助手检查游戏状态：

mongos> sh.status()
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "version" : 3,
        "minCompatibleVersion" : 3,
        "currentVersion" : 4,
        "clusterId" : ObjectId("527056b8f6985e1bcce4c4cb")
}
  shards:
        {  "_id" : "shard0000",  "host" : "localhost:30000" }
        {  "_id" : "shard0001",  "host" : "localhost:30001" }
        {  "_id" : "shard0002",  "host" : "localhost:30002" }
        {  "_id" : "shard0003",  "host" : "localhost:30003" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "users",  "partitioned" : true,  "primary" : "shard0001" }
                users.userInfo
                        shard key: { "_id" : 1 }
                        chunks:
                                shard0001       2049
                        too many chunks to print, use verbose if you want to force print

我们有 2048 个块（由于最小/最大块而额外增加了一个块），但由于平衡器已关闭，它们都仍在原始分片上。所以，让我们重新启用平衡器：

sh.startBalancer();

这将立即开始平衡，并且会相对较快，因为所有块都是空的，但仍需要一点时间（如果与其他集合的迁移竞争，则要慢得多）。一段时间过去后，sh.status()再次运行，您（应该）拥有它 - 2048 个块都很好地分成 4 个分片，并准备好进行初始数据加载：

mongos> sh.status()
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "version" : 3,
        "minCompatibleVersion" : 3,
        "currentVersion" : 4,
        "clusterId" : ObjectId("527056b8f6985e1bcce4c4cb")
}
  shards:
        {  "_id" : "shard0000",  "host" : "localhost:30000" }
        {  "_id" : "shard0001",  "host" : "localhost:30001" }
        {  "_id" : "shard0002",  "host" : "localhost:30002" }
        {  "_id" : "shard0003",  "host" : "localhost:30003" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "users",  "partitioned" : true,  "primary" : "shard0001" }
                users.userInfo
                        shard key: { "_id" : 1 }
                        chunks:
                                shard0000       512
                                shard0002       512
                                shard0003       512
                                shard0001       513
                        too many chunks to print, use verbose if you want to force print
        {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0002" }

您现在已准备好开始加载数据，但要绝对保证在数据加载完成之前不会发生拆分或迁移，您还需要做一件事 - 在导入期间关闭平衡器和自动拆分：

要禁用所有平衡，请从 mongos 运行以下命令：sh.stopBalancer()
如果您想让其他平衡操作继续运行，您可以在特定集合上禁用。以上面的命名空间为例：sh.disableBalancing("users.userInfo")
要在加载期间关闭自动拆分，您需要重新启动每个mongos您将用于加载数据的--noAutoSplit选项。

导入完成后，根据需要反转步骤（sh.startBalancer()、sh.enableBalancing("users.userInfo")，然后重新启动mongos没有--noAutoSplit）以将所有内容恢复为默认设置。

**

更新：优化速度

**

如果您不着急，上述方法很好。从目前的情况来看，如果您对此进行测试，您会发现，平衡器并不是很快 - 即使是空块。因此，当您增加创建的块数时，平衡所需的时间越长。我已经看到完成平衡 2048 个块需要 30 多分钟，尽管这会因部署而异。

这对于测试或相对安静的集群来说可能没问题，但是在繁忙的集群上，关闭平衡器并且不需要其他更新干扰将更难确保。那么，我们如何加快速度呢？

答案是尽早进行一些手动操作，然后在它们位于各自的分片上时将它们拆分。请注意，这仅适用于某些分片键（如随机分布的 UUID）或某些数据访问模式，因此请注意不要导致数据分布不佳。

使用上面的示例，我们有 4 个分片，因此我们没有进行所有拆分，然后进行平衡，而是拆分为 4 个。然后我们通过手动移动它们在每个分片上放置一个块，最后我们将这些块分成所需的数量。

上面示例中的范围如下所示：

$min --> "40000000000000000000000000000000"
"40000000000000000000000000000000" --> "80000000000000000000000000000000"
"80000000000000000000000000000000" --> "c0000000000000000000000000000000"
"c0000000000000000000000000000000" --> $max

创建这些命令只有 4 个命令，但既然我们有了它，为什么不以简化/修改的形式重新使用上面的循环：

for ( var x=4; x < 16; x+=4){
    var prefix = "" + x.toString(16) + "0000000000000000000000000000000";
    db.adminCommand( { split : "users.userInfo" , middle : { _id : prefix } } ); 
}

下面是 thinks 现在的样子——我们有 4 个块，都在 shard0001 上：

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 4,
    "minCompatibleVersion" : 4,
    "currentVersion" : 5,
    "clusterId" : ObjectId("53467e59aea36af7b82a75c1")
}
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:30000" }
    {  "_id" : "shard0001",  "host" : "localhost:30001" }
    {  "_id" : "shard0002",  "host" : "localhost:30002" }
    {  "_id" : "shard0003",  "host" : "localhost:30003" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0001" }
    {  "_id" : "users",  "partitioned" : true,  "primary" : "shard0001" }
        users.userInfo
            shard key: { "_id" : 1 }
            chunks:
                shard0001   4
            { "_id" : { "$minKey" : 1 } } -->> { "_id" : "40000000000000000000000000000000" } on : shard0001 Timestamp(1, 1) 
            { "_id" : "40000000000000000000000000000000" } -->> { "_id" : "80000000000000000000000000000000" } on : shard0001 Timestamp(1, 3) 
            { "_id" : "80000000000000000000000000000000" } -->> { "_id" : "c0000000000000000000000000000000" } on : shard0001 Timestamp(1, 5) 
            { "_id" : "c0000000000000000000000000000000" } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 Timestamp(1, 6)

我们将把$min块留在原处，并移动其他三个。您可以通过编程方式执行此操作，但这确实取决于块最初所在的位置、您如何命名分片等。所以我现在将离开本手册，它并不太繁重 - 只有 3 个moveChunk命令：

mongos> sh.moveChunk("users.userInfo", {"_id" : "40000000000000000000000000000000"}, "shard0000")
{ "millis" : 1091, "ok" : 1 }
mongos> sh.moveChunk("users.userInfo", {"_id" : "80000000000000000000000000000000"}, "shard0002")
{ "millis" : 1078, "ok" : 1 }
mongos> sh.moveChunk("users.userInfo", {"_id" : "c0000000000000000000000000000000"}, "shard0003")
{ "millis" : 1083, "ok" : 1 }

让我们仔细检查一下，确保块在我们期望的位置：

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 4,
    "minCompatibleVersion" : 4,
    "currentVersion" : 5,
    "clusterId" : ObjectId("53467e59aea36af7b82a75c1")
}
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:30000" }
    {  "_id" : "shard0001",  "host" : "localhost:30001" }
    {  "_id" : "shard0002",  "host" : "localhost:30002" }
    {  "_id" : "shard0003",  "host" : "localhost:30003" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0001" }
    {  "_id" : "users",  "partitioned" : true,  "primary" : "shard0001" }
        users.userInfo
            shard key: { "_id" : 1 }
            chunks:
                shard0001   1
                shard0000   1
                shard0002   1
                shard0003   1
            { "_id" : { "$minKey" : 1 } } -->> { "_id" : "40000000000000000000000000000000" } on : shard0001 Timestamp(4, 1) 
            { "_id" : "40000000000000000000000000000000" } -->> { "_id" : "80000000000000000000000000000000" } on : shard0000 Timestamp(2, 0) 
            { "_id" : "80000000000000000000000000000000" } -->> { "_id" : "c0000000000000000000000000000000" } on : shard0002 Timestamp(3, 0) 
            { "_id" : "c0000000000000000000000000000000" } -->> { "_id" : { "$maxKey" : 1 } } on : shard0003 Timestamp(4, 0)

这与我们上面建议的范围相匹配，所以一切看起来都不错。现在运行上面的原始循环以在每个分片上“就地”分割它们，一旦循环结束，我们应该有一个平衡的分布。还有一个sh.status()应该确认的事情：

mongos> for ( var x=0; x < 16; x++ ){
...   for( var y=0; y<16; y++ ) {
...   // for the innermost loop we will increment by 2 to get 2048 total iterations
...   // make this z++ for 4096 - that would give ~30MB chunks based on the original figures
...     for ( var z=0; z<16; z+=2 ) {
...     // now construct the GUID with zeroes for padding - handily the toString method takes an argument to specify the base
...         var prefix = "" + x.toString(16) + y.toString(16) + z.toString(16) + "00000000000000000000000000000";
...         // finally, use the split command to create the appropriate chunk
...         db.adminCommand( { split : "users.userInfo" , middle : { _id : prefix } } );
...     }
...   }
... }          
{ "ok" : 1 }
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 4,
    "minCompatibleVersion" : 4,
    "currentVersion" : 5,
    "clusterId" : ObjectId("53467e59aea36af7b82a75c1")
}
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:30000" }
    {  "_id" : "shard0001",  "host" : "localhost:30001" }
    {  "_id" : "shard0002",  "host" : "localhost:30002" }
    {  "_id" : "shard0003",  "host" : "localhost:30003" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0001" }
    {  "_id" : "users",  "partitioned" : true,  "primary" : "shard0001" }
        users.userInfo
            shard key: { "_id" : 1 }
            chunks:
                shard0001   513
                shard0000   512
                shard0002   512
                shard0003   512
            too many chunks to print, use verbose if you want to force print

你有它 - 无需等待平衡器，分布已经均匀。

mongodb - 如何使用 MongoDB 以编程方式预拆分基于 GUID 的分片键

1 回答 1

更新：优化速度

Related

Reference