mongodb - 尝试降级 mongodb 副本集时出错

Question

设置：
1 个主要、3 个辅助和 1 个仲裁器（所有 5 个都在运行。）

当我在初级中执行 stepDown() 时，在 shell 中出现以下错误 -

m101:PRIMARY> rs.stepDown() 
2017-06-07T15:01:21.357 E QUERY    [thread1] Error: error doing query: failed: network error while attempting to run command 'replSetStepDown' on host '127.0.0.1:27018'  :  
DB.prototype.runCommand@src/mongo/shell/db.js:132:1 
DB.prototype.adminCommand@src/mongo/shell/db.js:150:16  
rs.stepDown@src/mongo/shell/utils.js:1261:12
@(shell):1:1 
2017-06-07T15:01:21.360 I NETWORK  [thread1] trying reconnect to 127.0.0.1:27018 (127.0.0.1) failed
2017-06-07T15:01:21.361 I NETWORK  [thread1] reconnect 127.0.0.1:27018 (127.0.0.1) ok
m101:SECONDARY>

请注意，最终主要更改为次要。但是为什么我在这里收到这个错误，“失败：网络错误”？

以下是主要日志文件中被降级的部分 -

2017-06-07T15:01:22.170 I REPL     [replication-5] Restarting oplog query due to error: InterruptedDueToReplStateChange: operation was interrupted. Last fetched optime (with hash): { ts: Timestamp 1496827872000|1, t: 2 }[-3490433912114125886]. Restarts remaining: 3
2017-06-07T15:01:22.170 I REPL     [replication-5] Scheduled new oplog query Fetcher source: localhost:27018 database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1496827872000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 2 } query metadata: { $replData: 1, $oplogQueryData: 1, $ssm: { $secondaryOk: true } } active: 1 timeout: 65000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 159434 -- target:localhost:27018 db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp 1496827872000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 2 } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms
2017-06-07T15:01:22.173 W REPL     [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: Sync source's last applied OpTime { ts: Timestamp 1496827872000|1, t: 2 } is not greater than our last fetched OpTime { ts: Timestamp 1496827872000|1, t: 2 }. Choosing new sync source.
2017-06-07T15:01:22.176 I REPL     [rsBackgroundSync] could not find member to sync from
2017-06-07T15:01:22.180 I REPL     [ReplicationExecutor] Member localhost:27018 is now in state SECONDARY
2017-06-07T15:01:27.335 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to localhost:27018: InvalidSyncSource: Sync source was cleared. Was localhost:27018
2017-06-07T15:01:29.879 I -        [conn2] end connection 127.0.0.1:59400 (5 connections now open)
2017-06-07T15:01:30.899 I NETWORK  [thread1] connection accepted from 127.0.0.1:64421 #28 (5 connections now open)
2017-06-07T15:01:30.899 I NETWORK  [conn28] received client metadata from 127.0.0.1:64421 conn28: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.4" }, os: { type: "Windows", name: "Microsoft Windows 8", architecture: "x86_64", version: "6.2 (build 9200)" } }
2017-06-07T15:01:30.917 I NETWORK  [thread1] connection accepted from 127.0.0.1:64425 #29 (6 connections now open)
2017-06-07T15:01:30.920 I NETWORK  [conn29] received client metadata from 127.0.0.1:64425 conn29: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.4" }, os: { type: "Windows", name: "Microsoft Windows 8", architecture: "x86_64", version: "6.2 (build 9200)" } }
2017-06-07T15:01:30.927 I NETWORK  [thread1] connection accepted from 127.0.0.1:64428 #30 (7 connections now open)
2017-06-07T15:01:30.928 I NETWORK  [conn30] received client metadata from 127.0.0.1:64428 conn30: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.4" }, os: { type: "Windows", name: "Microsoft Windows 8", architecture: "x86_64", version: "6.2 (build 9200)" } }
2017-06-07T15:01:31.561 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
2017-06-07T15:01:31.562 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
2017-06-07T15:01:31.563 I REPL     [ReplicationExecutor] VoteRequester(term 2 dry run) received a no vote from localhost:27018 with reason "candidate's term is lower than mine"; response message: { term: 3, voteGranted: false, reason: "candidate's term is lower than mine", ok: 1.0 }
2017-06-07T15:01:31.563 I REPL     [ReplicationExecutor] not running for primary, we have been superceded already
2017-06-07T15:01:32.183 I REPL     [ReplicationExecutor] Member localhost:27019 is now in state PRIMARY
2017-06-07T15:01:32.185 I REPL     [rsBackgroundSync] sync source candidate: localhost:27019
2017-06-07T15:01:32.186 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to localhost:27019
2017-06-07T15:01:33.198 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to localhost:27019, took 1012ms (1 connections now open to localhost:27019)

试图了解这里发生了什么。任何线索将不胜感激。

提前致谢！

score 8 · Accepted Answer

这是预期的行为。当主节点关闭时，它会丢弃所有连接，包括您在mongoshell 中的活动连接。然后mongo外壳程序重新建立连接（根据“尝试重新连接”消息）。

通过rs.stepDown()shell 输出的示例，文档可能会更清楚地说明这一点，但相关描述是：

成功降级后，rs.stepDown() 会强制当前连接到数据库的所有客户端断开连接。这有助于确保客户端维护副本集的准确视图。

由于断开连接包括用于运行命令的连接，因此如果命令成功完成，您将无法检索命令的返回状态；即，如果命令出错，您只能检索命令的返回状态。在脚本中运行命令时，脚本应考虑此行为。

mongodb - 尝试降级 mongodb 副本集时出错

1 回答 1

Related

Reference