0

我有一个包含 3 个 MongoDB 实例的副本集。这些实例具有 8GB 的​​ RAM 和双核 2.27 GHz CPU。所有实例都运行 2.2.2 版(我在 2.0.1 中看到了相同的行为)。

这是我的问题:我们的主实例(副本集的主实例)最近养成了每 2 天爬到 100% CPU 的习惯。追查原因,我决定运行 MongoDB 分析器。我发现了数百个非常慢的查询。这是一个例子:

> db.system.profile.find()
{ 
    "ts" : ISODate("2012-12-16T20:31:39.078Z"), 
    "op" : "command", 
    "ns" : "stylesaint.$cmd", 
    "command" : { 
        "count" : "tears", 
        "query" : { 
            "_id" : { "$gt" : ObjectId("50cdeadeaf58d3de96000294") }, 
            "active" : true, 
            "is_image_processed" : true, 
            "hidden_from_feed" : false, 
            "hidden_from_public_feeds" : false
        }, 
        "fields" : null 
    }, 
    "ntoreturn" : 1, 
    "responseLength" : 48, 
    "millis" : 13930, 
    "client" : "#########"
}

根据我对 mongodb 的了解,在这些情况下,自然的下一步是尝试对这些查询进行解释()。但是,explain() 并没有解释查询的缓慢:

> db.tears.find({ "_id" : { "$gt" : ObjectId("50cdeadeaf58d3de96000294") }, "active" : true, "is_image_processed" : true, "hidden_from_feed" : false, "hidden_from_public_feeds" : false }).explain()
{
    "cursor" : "BtreeCursor id",
    "isMultiKey" : false,
    "n" : 4,
    "nscannedObjects" : 5,
    "nscanned" : 5,
    "nscannedObjectsAllPlans" : 23,
    "nscannedAllPlans" : 25,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : { 
        "_id" : [ 
            [ 
                ObjectId("50cdeadeaf58d3de96000294"), 
                ObjectId("ffffffffffffffffffffffff")
            ]
        ]
    },
    "server" : "#########"
}

扫描 5 个文档不应花费 13 秒。正在发生的其他事情正在减慢查询速度。也许其他一些查询正在耗尽服务器的资源?但是,我不知道在哪里看。感谢您提供任何建议。

MongoDB 日志

我在启动过程中找不到任何警告:

***** SERVER RESTARTED *****


Sun Dec 16 21:02:56 [initandlisten] MongoDB starting : pid=...
Sun Dec 16 21:02:56 [initandlisten] db version v2.2.2, pdfile version 4.5
Sun Dec 16 21:02:56 [initandlisten] git version: ...   
Sun Dec 16 21:02:56 [initandlisten] build info: Linux 2.6.21.7-2 ...
Sun Dec 16 21:02:56 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/data/mongodb", logappend: "true", logpath: "/var/log/mongodb/mongodb.log", replSet: "...", rest: "true" }
Sun Dec 16 21:02:56 [initandlisten] journal dir=/data/mongodb/journal
Sun Dec 16 21:02:56 [initandlisten] recover : no journal files present, no recovery needed
Sun Dec 16 21:02:56 [initandlisten] waiting for connections on port ...
Sun Dec 16 21:02:56 [websvr] admin web console waiting for connections on port ...
Sun Dec 16 21:02:56 [initandlisten] connection accepted from ...
Sun Dec 16 21:02:56 [conn1] end connection ... (0 connections now open)
Sun Dec 16 21:02:56 [initandlisten] connection accepted from ... #2 (1 connection now open)
Sun Dec 16 21:02:56 [rsStart] replSet I am ...
Sun Dec 16 21:02:56 [rsStart] replSet STARTUP2
Sun Dec 16 21:02:56 [rsHealthPoll] replSet member ... is up
Sun Dec 16 21:02:56 [rsHealthPoll] replSet member ... is now in state SECONDARY
Sun Dec 16 21:02:57 [initandlisten] connection accepted from ... #3 (2 connections now open)
Sun Dec 16 21:02:57 [rsSync] replSet SECONDARY
Sun Dec 16 21:02:58 [initandlisten] connection accepted from ... #4 (3 connections now open)
Sun Dec 16 21:02:58 [initandlisten] connection accepted from ... #5 (4 connections now open)
Sun Dec 16 21:02:58 [conn5] end connection ... (3 connections now open)
Sun Dec 16 21:02:58 [rsHealthPoll] replSet member ... is up
Sun Dec 16 21:02:58 [rsHealthPoll] replSet member ... is now in state PRIMARY
Sun Dec 16 21:02:59 [initandlisten] connection accepted from ... #6 (4 connections now open)
Sun Dec 16 21:03:00 [initandlisten] connection accepted from ... #7 (5 connections now open)
Sun Dec 16 21:03:02 [conn7] end connection ... (4 connections now open)
Sun Dec 16 21:03:03 [rsBackgroundSync] replSet syncing to: ...
Sun Dec 16 21:03:04 [rsSyncNotifier] replset setting oplog notifier to ...
Sun Dec 16 21:03:06 [conn2] end connection ... (3 connections now open)
Sun Dec 16 21:03:06 [initandlisten] connection accepted from ... #8 (4 connections now open)
Sun Dec 16 21:03:08 [initandlisten] connection accepted from ... #9 (5 connections now open)
Sun Dec 16 21:03:13 [initandlisten] connection accepted from ... #10 (6 connections now open)
Sun Dec 16 21:03:13 [conn10] end connection ... (5 connections now open)
Sun Dec 16 21:03:13 [initandlisten] connection accepted from ... #11 (6 connections now open)
Sun Dec 16 21:03:15 [conn3] end connection ... (5 connections now open)
Sun Dec 16 21:03:16 [rsHealthPoll] replSet member .... is now in state SECONDARY
Sun Dec 16 21:03:16 [rsMgr] replSet info electSelf 1
Sun Dec 16 21:03:16 [rsMgr] replSet PRIMARY

回复:请求更多信息

目前,MongoDB运行正常;没有超过 100 毫秒的查询。一旦 100% CPU 再次发生,我将发布有关系统资源的更多信息。

4

1 回答 1

0

首先,我认为这些查询可能是一个红鲱鱼。您是否在 NUMA 架构下运行这些服务器?您可以阅读Mongo 文档以了解在 NUMA 系统上的用法

如果您在 NUMA 系统上运行,那么使用 numactl 使用交错策略运行守护程序可能会解决您的问题。

您可以检查是否有任何启动警告。它们会在您启动守护程序时出现在您的日志中,并且您可以在守护程序运行时找到它们,尽管我不记得我是怎么想的。

如果做不到这一点,您可能会在进行这些查询时检查您的 IO 操作。如果我不得不猜测,您正在访问您的磁盘并且没有使用内存中的工作集进行操作。您的内存使用统计信息(free -h 和 mongo 控制台内部的内存使用指标)是什么样的?

于 2012-12-18T00:33:24.313 回答