regex - 使用正则表达式和排序的 Mongodb 简单前缀查询很慢

Question

我坚持使用这个简单的前缀查询。尽管Mongo 文档声明您可以通过使用前缀正则表达式格式 (/^a/) 获得相当不错的性能，但当我尝试对结果进行排序时查询非常慢：

940 毫

db.posts.find({hashtags: /^noticias/ }).limit(15).sort({rank : -1}).hint('hashtags_1_rank_-1').explain()

{
"cursor" : "BtreeCursor hashtags_1_rank_-1 multi",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 142691,
"nscanned" : 142692,
"nscannedObjectsAllPlans" : 142691,
"nscannedAllPlans" : 142692,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 934,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticiat"
        ],
        [
            /^noticias/,
            /^noticias/
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"
}

但是，同一查询的未排序版本非常快：

0 毫秒

db.posts.find({hashtags: /^noticias/ }).limit(15).hint('hashtags_1_rank_-1').explain()

{
"cursor" : "BtreeCursor hashtags_1_rank_-1 multi",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 15,
"nscanned" : 15,
"nscannedObjectsAllPlans" : 15,
"nscannedAllPlans" : 15,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticiat"
        ],
        [
            /^noticias/,
            /^noticias/
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"

}

如果我删除正则表达式并排序，查询也很快：

0 毫秒

db.posts.find({hashtags: 'noticias' }).limit(15).sort({rank : -1}).hint('hashtags_1_rank_-1').explain()

{
"cursor" : "BtreeCursor hashtags_1_rank_-1",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 15,
"nscanned" : 15,
"nscannedObjectsAllPlans" : 15,
"nscannedAllPlans" : 15,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticias"
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"

}

似乎同时使用正则表达式和排序会使 Mongo 扫描大量记录。但是，如果我不使用正则表达式，排序只会扫描 15 个。这里有什么问题？

score 7 · Accepted Answer

解释输出中的scanAndOrder: true表示查询必须检索文档，然后在返回输出之前在内存中对它们进行排序。这是一项昂贵的操作，并且会对查询的性能产生影响。

解释输出中的存在scanAndOrder: true以及差异表明查询未使用最佳索引。在这种情况下，它似乎需要进行集合扫描。您可以通过在条件中包含索引键来缓解此问题。根据我的测试：nscannednsort

db.posts.find({hashtags: /^noticias/ }).limit(15).sort({hashtags:1, rank : -1}).explain()

Does not require a scan and order, and returns n and nscanned of the number of records you are looking for. This would also mean sorting on the hashtags key, which may or may not be useful to you, but should increase the performance of the query.

regex - 使用正则表达式和排序的 Mongodb 简单前缀查询很慢

1 回答 1

Related

Reference