从 1.1 升级到 rancher 1.3 后,我们在运行 mongodb 集群时遇到了问题。突然rancher无故不断重启mogodb集群的至少一个节点,声称它不完整。下面你可以找到一个rancher log的片段(先看最后一个倒序的log):
01:19:17 PM INFO service.trigger.info Requested: 3, Created: 3, Unhealthy: 0, Bad: 0, Incomplete: 0
01:19:17 PM INFO service.trigger.info Service already reconciled
01:19:16 PM INFO service.trigger Re-evaluating state
01:19:16 PM INFO service.trigger (1 sec) Re-evaluating state
01:19:16 PM INFO service.trigger.info Service reconciled: Requested: 3, Created: 3, Unhealthy: 0, Bad: 0, Incomplete: 0
01:19:16 PM INFO service.update.info Service already reconciled
01:19:16 PM INFO service.update Updating service
01:19:16 PM INFO service.update.info Requested: 3, Created: 3, Unhealthy: 0, Bad: 0, Incomplete: 0
01:19:16 PM INFO service.trigger.exception Busy processing [SERVICE.280] will try later
01:19:03 PM INFO service.update Updating service
01:19:03 PM INFO service.update.exception Busy processing [SERVICE.280] will try later
01:19:02 PM INFO service.trigger.wait (14 sec) Waiting for instances to start
01:19:02 PM INFO service.instance.create Creating extra service instance
01:19:02 PM INFO service.instance.create Creating extra service instance
01:19:01 PM INFO service.trigger (15 sec) Re-evaluating state
01:19:01 PM INFO service.trigger.info Requested: 3, Created: 3, Unhealthy: 0, Bad: 0, Incomplete: 1
问题总是从Requested: 3, Created: 3, Unhealthy: 0, Bad: 0, Incomplete:1
然而在同一时间,在 mongo 中没有发生任何有趣的事情,突然它被外部的东西重新启动,即牧场主(以自然顺序登录):
2017-01-22T13:06:11.957+0000 I NETWORK [conn2362] end connection 10.42.191.72:55615 (24 connections now open)
2017-01-22T13:06:14.848+0000 I NETWORK [initandlisten] connection accepted from 10.42.191.72:55635 #2363 (25 connections now open)
2017-01-22T13:06:14.849+0000 I NETWORK [conn2363] end connection 10.42.191.72:55635 (24 connections now open)
(nothing unusual until here, look here)->
2017-01-22T13:06:15.243+0000 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2017-01-22T13:06:15.244+0000 I FTDC [signalProcessingThread] Shutting down full-time diagnostic data capture
2017-01-22T13:06:15.253+0000 I REPL [signalProcessingThread] Stopping replication applier threads
2017-01-22T13:06:15.556+0000 I STORAGE [conn105] got request after shutdown()
2017-01-22T13:06:15.871+0000 I STORAGE [conn91] got request after shutdown()
2017-01-22T13:06:15.874+0000 I STORAGE [conn86] got request after shutdown()
2017-01-22T13:06:15.887+0000 I STORAGE [conn82] got request after shutdown()
2017-01-22T13:06:15.941+0000 I STORAGE [conn83] got request after shutdown()
2017-01-22T13:06:16.009+0000 I STORAGE [conn85] got request after shutdown()
2017-01-22T13:06:16.020+0000 I STORAGE [conn84] got request after shutdown()
2017-01-22T13:06:16.108+0000 I STORAGE [conn75] got request after shutdown()
2017-01-22T13:06:16.133+0000 I STORAGE [conn87] got request after shutdown()
知道牧场主可以穿什么。我什至尝试创建没有客户端的干净 mongodb 相同的故事被牧场主以每小时两次的速度重新启动,有时甚至更频繁。任何解决方法?