2

并提前感谢您的时间。

对于给定的分片设置,mongos 在指定要与之交谈的配置服务器时启动。假设我们从以下 mongos 选项开始:

--configdb=cf1,cf2,cf3

一切都很好,花花公子。如果您要重新启动 mongos(或启动不同的 mongos):

-- configdb=cf3,cf2,cf1

它会导致以下错误:

Tue Jul  9 23:32:41 uncaught exception: error: { "$err" : "could not initialize sharding on connection rs1/db1.me.net:27017,db2.me.net:27017,db3.me.net:27017, :: caused by :: mongos specified a different config database string : stored :cfg1:27017,cfg2:27017,cfg3:27017 vs given :cfg3:27017,cfg2:27017,cfg1:27017","code" : 15907}

我的问题是,mongo 对配置服务器字符串的顺序敏感的原因是什么?我会想象在某些时候它会解析不同的服务器主机名/端口,那么为什么不只比较集合呢?我知道您可以从源代码中看到它只是一个字符串比较,但我的问题是其根本原因。

这个问题的一些背景:我正在为我的 mongo 部署使用 chef。我们最近进行了迁移具有相同主机名的配置服务器的练习。然而,这仍然是一个破坏性的过程,因为厨师拿起配置服务器的顺序已经改变,因此改变订单 mongos 开始它的过程。我知道这个问题直接是因为厨师的功能,但我很好奇为什么 Mongo 没有这么灵活。

感谢你的宝贵时间。

4

1 回答 1

2

When mongos process changes metadata for sharded cluster, it has to change it in all three config servers "simultaneously" (i.e. all three must agree in order to have a valid metadata change).

If the system were to go down in the middle of such a metadata change, if the config database order was not fixed, there would be a lot more possible permutations of incorrect states to unwind. Requiring a fixed sequence of config dbs allows (a) simpler checking of whether all members of the cluster are viewing the same metadata (b) significant reduction of possible states when a system crashes or otherwise stops unexpectedly.

In addition it reduces chances for "race condition" sorts of bugs if different mongos' could initiate the same operations on different config servers. Even as simple a change as mongos process taking a "virtual" distributed lock to see if a chance is necessary - how could you handle the case of different mongos' checking config servers in different order to check on (and take out) the lock?

As a summary, the three config servers are not a replica set, but one of them still has to be the one that always accepts the changes "first" - think of the order of configdbs to mongos as designation of such "first" status.

于 2013-09-01T20:57:08.097 回答