我们有一个相对简单的 MongoDB 分片设置:4 个分片,每个分片是一个至少有 3 个成员的副本集。每个集合都包含从大量文件中加载的数据;每个文件都有一个单调递增的 ID,并且分片是基于 ID 的哈希完成的。
我们的大多数系列都按预期工作。但是,我有一个似乎没有在分片之间正确分布块的集合。在创建索引并对其进行分片之前,该集合已加载约 30GB 的数据,但据我所知,这无关紧要。以下是该系列的统计数据:
mongos> db.mycollection.stats()
{
"sharded" : true,
"ns" : "prod.mycollection",
"count" : 53304954,
"numExtents" : 37,
"size" : 35871987376,
"storageSize" : 38563958544,
"totalIndexSize" : 8955712416,
"indexSizes" : {
"_id_" : 1581720784,
"customer_code_1" : 1293148864,
"job_id_1_customer_code_1" : 1800853936,
"job_id_hashed" : 3365576816,
"network_code_1" : 914412016
},
"avgObjSize" : 672.9578525853339,
"nindexes" : 5,
"nchunks" : 105,
"shards" : {
"rs0" : {
"ns" : "prod.mycollection",
"count" : 53304954,
"size" : 35871987376,
"avgObjSize" : 672.9578525853339,
"storageSize" : 38563958544,
"numExtents" : 37,
"nindexes" : 5,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1.0000000000050822,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 8955712416,
"indexSizes" : {
"_id_" : 1581720784,
"job_id_1_customer_code_1" : 1800853936,
"customer_code_1" : 1293148864,
"network_code_1" : 914412016,
"job_id_hashed" : 3365576816
},
"ok" : 1
}
},
"ok" : 1
}
这个集合的 sh.status() :
prod.mycollection
shard key: { "job_id" : "hashed" }
chunks:
rs0 105
too many chunks to print, use verbose if you want to force print
关于为什么这个集合只会分发到 rs0,我有什么遗漏吗?有没有办法强制重新平衡?我执行了相同的步骤来分片其他集合,并且它们正确地分布自己。以下是成功分片的集合的统计信息:
mongos> db.myshardedcollection.stats()
{
"sharded" : true,
"ns" : "prod.myshardedcollection",
"count" : 5112395,
"numExtents" : 71,
"size" : 4004895600,
"storageSize" : 8009994240,
"totalIndexSize" : 881577200,
"indexSizes" : {
"_id_" : 250700688,
"customer_code_1" : 126278320,
"job_id_1_customer_code_1" : 257445888,
"job_id_hashed" : 247152304
},
"avgObjSize" : 783.3697513591966,
"nindexes" : 4,
"nchunks" : 102,
"shards" : {
"rs0" : {
"ns" : "prod.myshardedcollection",
"count" : 1284540,
"size" : 969459424,
"avgObjSize" : 754.7133012595949,
"storageSize" : 4707762176,
"numExtents" : 21,
"nindexes" : 4,
"lastExtentSize" : 1229475840,
"paddingFactor" : 1.0000000000000746,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 190549856,
"indexSizes" : {
"_id_" : 37928464,
"job_id_1_customer_code_1" : 39825296,
"customer_code_1" : 33734176,
"job_id_hashed" : 79061920
},
"ok" : 1
},
"rs1" : {
"ns" : "prod.myshardedcollection",
"count" : 1287243,
"size" : 1035438960,
"avgObjSize" : 804.384999568846,
"storageSize" : 1178923008,
"numExtents" : 17,
"nindexes" : 4,
"lastExtentSize" : 313208832,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 222681536,
"indexSizes" : {
"_id_" : 67787216,
"job_id_1_customer_code_1" : 67345712,
"customer_code_1" : 30169440,
"job_id_hashed" : 57379168
},
"ok" : 1
},
"rs2" : {
"ns" : "prod.myshardedcollection",
"count" : 1131411,
"size" : 912549232,
"avgObjSize" : 806.5585644827565,
"storageSize" : 944386048,
"numExtents" : 16,
"nindexes" : 4,
"lastExtentSize" : 253087744,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 213009328,
"indexSizes" : {
"_id_" : 64999200,
"job_id_1_customer_code_1" : 67836272,
"customer_code_1" : 26522944,
"job_id_hashed" : 53650912
},
"ok" : 1
},
"rs3" : {
"ns" : "prod.myshardedcollection",
"count" : 1409201,
"size" : 1087447984,
"avgObjSize" : 771.6769885914075,
"storageSize" : 1178923008,
"numExtents" : 17,
"nindexes" : 4,
"lastExtentSize" : 313208832,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 255336480,
"indexSizes" : {
"_id_" : 79985808,
"job_id_1_customer_code_1" : 82438608,
"customer_code_1" : 35851760,
"job_id_hashed" : 57060304
},
"ok" : 1
}
},
"ok" : 1
}
sh.status() 用于正确分片的集合:
prod.myshardedcollection
shard key: { "job_id" : "hashed" }
chunks:
rs2 25
rs1 26
rs3 25
rs0 26
too many chunks to print, use verbose if you want to force print