I Came by a problem where i have an Ops Manager that suppose to run a MongoDB cluster as an automated cluster.

Suddenly the servers started going down, unexpectedly - while there are no errors in any of the log files indicating on when is the problem.

The Ops Manager gets stuck on the blue label

We are deploying your changes. This might take a few minutes

And it just never goes away.

Because this environment is based on the automation feature, the mms is managing the user on the servers and runs all of the processes from "mongod" which i can't access even as a Root (administrator).

As far as the Ops Manager goes it shows that a shard in a replica set is down although it's live, and thinks that a mongos that is dead is alive.

Has someone got into this situation before and may be able to help ?

Thanks, Eliran.


1 回答 1


发现问题:集群中的服务器之间存在某种 ntp 不匹配,所以发生的情况是服务器未同步,并且每次操作管理器执行某些操作时,它都会收到错误时间的响应,并且无法使用它的时间限制。

在将所有 ntp 重新配置回同一个之后 - 一切都恢复到应有的状态:)

于 2017-03-06T08:59:36.663 回答