2

当输入数据是单个值并且集合数据包含最小/最大范围时,在 Mongo 中查找数据的最有效方法是什么?例如:

record = { min: number, max: number, payload }

需要为记录的最小/最大范围内的数字查找记录。范围从不相交。范围的大小无法预测。

该集合中有大约 600 万条记录。如果我解压缩范围(范围内的每个值都有记录),我会查看大约 4B 条记录。

我创建了 的复合索引{min:1,max:1},但尝试使用以下方法进行搜索:

db.block.find({min:{$lte:value},max:{$gte:value})

... 需要几秒到几十秒的时间。下面是 和 的explain()输出getIndexes()。有什么技巧可以让搜索执行得更快吗?

NJmongo:PRIMARY> db.block.getIndexes()
[
    {
            "v" : 1,
            "key" : {
                    "_id" : 1
            },
            "ns" : "mispot.block",
            "name" : "_id_"
    },
    {
            "v" : 1,
            "key" : {
                    "min" : 1,
                    "max" : 1
            },
            "ns" : "mispot.block",
            "name" : "min_1_max_1"
    }
] 


NJmongo:PRIMARY> db.block.find({max:{$gte:1135194602},min:{$lte:1135194602}}).explain()
{
    "cursor" : "BtreeCursor min_1_max_1",
    "isMultiKey" : false,
    "n" : 1,
    "nscannedObjects" : 1,
    "nscanned" : 1199049,
    "nscannedObjectsAllPlans" : 1199050,
    "nscannedAllPlans" : 2398098,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 7534,
    "nChunkSkips" : 0,
    "millis" : 5060,
    "indexBounds" : {
            "min" : [
                    [
                            -1.7976931348623157e+308,
                            1135194602
                    ]
            ],
            "max" : [
                    [
                            1135194602,
                            1.7976931348623157e+308
                    ]
            ]
    },
    "server" : "ccc:27017"
}
4

1 回答 1

1

If the ranges of your block records never overlap, then you can accomplish this much faster with:

db.block.find({min:{$lte:value}}).sort({min:-1}).limit(1)

This query will return almost instantly since it can find the record with a simple lookup in the index.

The query you are running is slow because the two clauses each match on millions of records that must be merged. In fact, I think your query would run faster (maybe much faster) with separate indexes on min and max since the max part of your compound index can only be used for a given min -- not to search for documents with a specific max.

于 2013-04-20T05:33:19.770 回答