我有一个在 Windows 上运行的 MongoDB 3 成员副本集。当主服务器(S1)宕机时,辅助服务器被正确选举。当主服务器恢复时,副本成员保持无效状态:
{
"state" : 10,
"stateStr" : "REMOVED",
"uptime" : 111,
"optime" : Timestamp(1448462710, 6),
"optimeDate" : ISODate("2015-11-25T14:45:10Z"),
"ok" : 0,
"errmsg" : "Our replica set config is invalid or we are not a member of it",
"code" : 93
}
之后,辅助服务器每隔几秒钟就会在主服务器和辅助服务器之间切换,使我的应用程序不稳定。
恢复主服务器的唯一方法是执行 rs.reconfig(c)。
我找不到配置文件有什么问题。
任何帮助将不胜感激。
更新: 这是当前配置:
{
"_id" : "companyName",
"version" : 32593,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 1,
"host" : "arb.companyName.com:40000",
"arbiterOnly" : true,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "m3.companyName.com:40000",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 11,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 4,
"host" : "m2.companyName.com:40000",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 3,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("573dfcd0e8ae6154ff80c50d")
}
}
我应该使用 IP 地址而不是主机名吗?
更新 2:
这是主服务器 (m3.companyName.com - IP 1.1.1.1) 的日志,从它重新启动到我进入另一台服务器 (m2.companyName.com - IP 2.2.2.2) 并进行了手动 rs .reconfig()。
2016-09-06T07:42:05.953Z I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2016-09-06T07:42:05.953Z I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory 'c:/mongossl/data3/diagnostic.data'
2016-09-06T07:42:05.954Z I NETWORK [initandlisten] waiting for connections on port 40000 ssl
2016-09-06T07:42:05.955Z W NETWORK [ReplicationExecutor] getaddrinfo("arb.companyName.com") failed: errno:11001 No such host is known.
2016-09-06T07:42:05.955Z I NETWORK [ReplicationExecutor] getaddrinfo("arb.companyName.com") failed: errno:11001 No such host is known.
2016-09-06T07:42:05.957Z W NETWORK [ReplicationExecutor] getaddrinfo("m3.companyName.com") failed: errno:11001 No such host is known.
2016-09-06T07:42:05.957Z I NETWORK [ReplicationExecutor] getaddrinfo("m3.companyName.com") failed: errno:11001 No such host is known.
2016-09-06T07:42:05.958Z W NETWORK [ReplicationExecutor] getaddrinfo("m2.companyName.com") failed: errno:11001 No such host is known.
2016-09-06T07:42:05.959Z I NETWORK [ReplicationExecutor] getaddrinfo("m2.companyName.com") failed: errno:11001 No such host is known.
2016-09-06T07:42:05.959Z W REPL [ReplicationExecutor] Locally stored replica set configuration does not have a valid entry for the current node; waiting for reconfig or remote heartbeat; Got "NodeNotFound: No host described in new configuration 32592 for replica set companyName2 maps to this node" while validating { _id: "companyName2", version: 32592, protocolVersion: 1, members: [ { _id: 1, host: "arb.companyName.com:40000", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "m3.companyName.com:40000", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 11.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 4, host: "m2.companyName.com:40000", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 3.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('573dfcd0e8ae6154ff80c50d') } }
2016-09-06T07:42:05.959Z I REPL [ReplicationExecutor] New replica set config in use: { _id: "companyName2", version: 32592, protocolVersion: 1, members: [ { _id: 1, host: "arb.companyName.com:40000", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "m3.companyName.com:40000", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 11.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 4, host: "m2.companyName.com:40000", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 3.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('573dfcd0e8ae6154ff80c50d') } }
2016-09-06T07:42:05.959Z I REPL [ReplicationExecutor] This node is not a member of the config
2016-09-06T07:42:05.959Z I REPL [ReplicationExecutor] transition to REMOVED
2016-09-06T07:42:05.959Z I REPL [ReplicationExecutor] Starting replication applier threads
2016-09-06T07:42:06.651Z I NETWORK [initandlisten] connection accepted from 2.2.2.2:53746 #1 (1 connection now open)
2016-09-06T07:42:06.760Z I NETWORK [initandlisten] connection accepted from 2.2.2.2:53747 #2 (2 connections now open)
2016-09-06T07:42:06.864Z I NETWORK [initandlisten] connection accepted from 2.2.2.2:53748 #3 (3 connections now open)
2016-09-06T07:42:06.993Z I ACCESS [conn1] authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=m2.companyName.com,O=companyName,ST=ON,C=CA" }
2016-09-06T07:42:07.067Z I ACCESS [conn2] authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=m2.companyName.com,O=companyName,ST=ON,C=CA" }
2016-09-06T07:42:07.159Z I ACCESS [conn3] authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=m2.companyName.com,O=companyName,ST=ON,C=CA" }
2016-09-06T07:42:07.552Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:42:07.627Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:42:08.975Z I NETWORK [conn1] end connection 2.2.2.2:53746 (2 connections now open)
2016-09-06T07:42:08.975Z I NETWORK [conn2] end connection 2.2.2.2:53747 (2 connections now open)
2016-09-06T07:42:08.975Z I NETWORK [conn3] end connection 2.2.2.2:53748 (2 connections now open)
2016-09-06T07:42:09.371Z I NETWORK [initandlisten] connection accepted from 2.2.2.2:53763 #4 (1 connection now open)
2016-09-06T07:42:09.639Z I ACCESS [conn4] authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=m2.companyName.com,O=companyName,ST=ON,C=CA" }
2016-09-06T07:42:13.059Z I NETWORK [initandlisten] connection accepted from 3.3.3.3:58220 #5 (2 connections now open)
2016-09-06T07:42:13.127Z I ACCESS [conn5] authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=arb.companyName.com,O=companyName,ST=ON,C=CA" }
2016-09-06T07:42:13.292Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to arb.companyName.com:40000
2016-09-06T07:42:13.301Z I REPL [ReplicationExecutor] Member arb.companyName.com:40000 is now in state ARBITER
2016-09-06T07:42:13.974Z I NETWORK [initandlisten] connection accepted from 2.2.2.2:53765 #6 (3 connections now open)
2016-09-06T07:42:14.433Z I ACCESS [conn6] Successfully authenticated as principal appUser on companyName
2016-09-06T07:42:16.629Z I NETWORK [initandlisten] connection accepted from 1.1.1.13:49162 #7 (4 connections now open)
2016-09-06T07:42:16.853Z I ACCESS [conn7] Successfully authenticated as principal appUser on companyName
2016-09-06T07:42:17.703Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:42:17.703Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:42:18.131Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:42:18.206Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:42:23.369Z I NETWORK [initandlisten] connection accepted from 2.2.2.2:53767 #8 (5 connections now open)
2016-09-06T07:42:23.832Z I ACCESS [conn8] Successfully authenticated as principal sa on admin
2016-09-06T07:42:28.356Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:42:38.431Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:42:38.431Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:42:38.861Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:42:38.936Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:42:49.086Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:42:59.161Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:42:59.161Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:42:59.590Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:42:59.665Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:43:09.814Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:43:19.889Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:43:19.889Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:43:20.317Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:43:20.392Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:43:30.542Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:43:34.054Z I NETWORK [initandlisten] connection accepted from 1.1.1.13:49188 #9 (6 connections now open)
2016-09-06T07:43:34.106Z I ACCESS [conn9] Successfully authenticated as principal sa on admin
2016-09-06T07:43:40.617Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:43:40.617Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:43:41.045Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:43:41.120Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:43:51.270Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:43:51.277Z I NETWORK [initandlisten] connection accepted from 1.1.1.13:49193 #10 (7 connections now open)
2016-09-06T07:43:51.339Z I ACCESS [conn10] Successfully authenticated as principal sa on admin
2016-09-06T07:44:01.346Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:44:01.346Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:44:01.775Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:44:01.850Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:44:12.001Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:44:22.077Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:44:22.077Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:44:22.506Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:44:22.582Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:44:32.732Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:44:42.807Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:44:42.807Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:44:43.237Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:44:43.312Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:44:53.462Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:45:03.537Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:45:03.537Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:45:03.966Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:45:04.041Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:45:14.191Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:45:24.266Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:45:24.266Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:45:24.700Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:45:24.775Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:45:34.925Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:45:45.000Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:45:45.000Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:45:45.428Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:45:45.504Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:45:55.654Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:46:05.729Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:46:05.729Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:46:06.157Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:46:06.232Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:46:16.382Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:46:26.458Z I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to m2.companyName.com:40000
2016-09-06T07:46:26.458Z I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
2016-09-06T07:46:26.889Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:46:26.964Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state SECONDARY
2016-09-06T07:46:37.115Z I REPL [ReplicationExecutor] Member m2.companyName.com:40000 is now in state PRIMARY
2016-09-06T07:46:43.185Z I NETWORK [initandlisten] connection accepted from 2.2.2.2:53847 #11 (8 connections now open)
2016-09-06T07:46:43.392Z I ACCESS [conn11] authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=m2.companyName.com,O=companyName,ST=ON,C=CA" }
2016-09-06T07:46:43.541Z I NETWORK [conn11] end connection 2.2.2.2:53847 (7 connections now open)
2016-09-06T07:46:44.370Z I NETWORK [initandlisten] connection accepted from 3.3.3.3:58224 #12 (8 connections now open)
2016-09-06T07:46:44.434Z I ACCESS [conn12] authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "CN=arb.companyName.com,O=companyName,ST=ON,C=CA" }
2016-09-06T07:46:44.451Z I NETWORK [conn12] end connection 3.3.3.3:58224 (7 connections now open)
2016-09-06T07:46:47.832Z I REPL [ReplicationExecutor] New replica set config in use: { _id: "companyName2", version: 32593, protocolVersion: 1, members: [ { _id: 1, host: "arb.companyName.com:40000", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "m3.companyName.com:40000", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 11.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 4, host: "m2.companyName.com:40000", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 3.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('573dfcd0e8ae6154ff80c50d') } }
2016-09-06T07:46:47.832Z I REPL [ReplicationExecutor] This node is m3.companyName.com:40000 in the config
2016-09-06T07:46:47.832Z I REPL [ReplicationExecutor] transition to STARTUP2
2016-09-06T07:46:47.907Z I REPL [ReplicationExecutor] Scheduling priority takeover at 2016-09-06T03:46:57.907-0400
2016-09-06T07:46:48.040Z I REPL [ReplicationExecutor] syncing from: m2.companyName.com:40000
2016-09-06T07:46:48.545Z I REPL [SyncSourceFeedback] setting syncSourceFeedback to m2.companyName.com:40000
2016-09-06T07:46:48.977Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:46:50.983Z I REPL [ReplicationExecutor] transition to RECOVERING
2016-09-06T07:46:50.985Z I REPL [ReplicationExecutor] transition to SECONDARY
2016-09-06T07:46:51.438Z I REPL [ReplicationExecutor] could not find member to sync from
2016-09-06T07:46:57.907Z I REPL [ReplicationExecutor] Canceling priority takeover callback
2016-09-06T07:46:57.907Z I REPL [ReplicationExecutor] Starting an election for a priority takeover
2016-09-06T07:46:57.907Z I REPL [ReplicationExecutor] conducting a dry run election to see if we could be elected
2016-09-06T07:46:57.916Z I REPL [ReplicationExecutor] dry election run succeeded, running for election
2016-09-06T07:46:57.925Z I REPL [ReplicationExecutor] election succeeded, assuming primary role in term 244
2016-09-06T07:46:57.925Z I REPL [ReplicationExecutor] transition to PRIMARY
2016-09-06T07:46:58.345Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:46:58.362Z I ASIO [NetworkInterfaceASIO-0] Successfully connected to m2.companyName.com:40000
2016-09-06T07:46:58.440Z I REPL [rsSync] transition to primary complete; database writes are now permitted
我注意到的最明显的事情是“没有这样的主机是已知的”错误。也许 Mongo 试图在 Windows 解析名称之前启动?