mongodb - MongoDB 不使用索引进行简单查询

Question

我对 mongodb 不同的查询有一个奇怪的行为。目前，我使用的是 2.6.10 版本。好的，让我们为测试创建简单的集合并进行解释。

from pymongo import MongoClient
import random

client = MongoClient('127.0.0.1', 27017)
client.DBTEST.random.remove({})

value = 0
BATCH_LEN = 16384
BATCH = []

for i in xrange(0, 500000):
    BATCH.append({
            "product": "value_uniq_1",
            "number": value
        })

    if random.randint(0, 100) <= 1:
        value = i

    if len(BATCH) > BATCH_LEN:
        client.DBTEST.random.insert(BATCH)
        BATCH = []

client.DBTEST.random.insert(BATCH)
BATCH = []

好的，它将创建包含这样的文档的集合 chich

╔══════════════╦════════╗
║   product    ║ number ║
╠══════════════╬════════╣
║ value_uniq_1 ║ 1      ║
║ value_uniq_1 ║ 1      ║
║ value_uniq_1 ║ 1      ║
║ value_uniq_1 ║ 56     ║
║ value_uniq_1 ║ 56     ║
║ value_uniq_1 ║ 56     ║
║ ...          ║ ...    ║
║ value_uniq_1 ║ 150054 ║
║ value_uniq_1 ║ 150054 ║
║ value_uniq_1 ║ 150054 ║
╚══════════════╩════════╝

现在，我只有 1 个唯一值product，但是，在不久的将来（1 周），它将增加到近 30 个不同的字符串值，如下所示：

╔══════════════╦════════╗
║   product    ║ number ║
╠══════════════╬════════╣
║ value_uniq_1 ║ 1      ║
║ value_uniq_1 ║ 1      ║
║ value_uniq_1 ║ 1      ║
║ value_uniq_1 ║ 56     ║
║ value_uniq_1 ║ 56     ║
║ value_uniq_1 ║ 56     ║
║ ...          ║ ...    ║
║ value_uniq_1 ║ 150054 ║
║ value_uniq_1 ║ 150054 ║
║ value_uniq_1 ║ 150054 ║
║ value_uniq_2 ║ 987    ║
║ value_uniq_2 ║ 987    ║
║ value_uniq_2 ║ 987    ║
╚══════════════╩════════╝

好的，我完成了我的数据结构，现在让我们看一些 mongodb 查询。

我的主要目标是获得numberfor certain的所有独特值product。

我这样做：

db.random.distinct("number", {product: "value_uniq_1"})

好的，这对于调试来说不是很冗长，我将db.runCommand在下一行中使用。但是，现在，让我们避免使用查询来区分和查看stats部分：

db.runCommand({distinct: 'random', key:'number'})

"stats" : {
    "n" : 500000,
    "nscanned" : 500000,
    "nscannedObjects" : 500000,
    "timems" : 479,
    "cursor" : "BasicCursor"
},

没关系，因为我们还没有创建索引，让我们添加number字段：

db.random.createIndex({number: 1})

重新运行之前的查询：

db.runCommand({distinct: 'random', key:'number'})

"stats" : {
    "n" : 10005,
    "nscanned" : 10005,
    "nscannedObjects" : 0,
    "timems" : 83,
    "cursor" : "DistinctCursor"
},

太好了，它使用索引，一切正常！0个扫描对象！！！

好的，让我们添加不同的查询：

db.runCommand({distinct: 'random', key:'number', query: {product: "value_uniq_1"}})

"stats" : {
    "n" : 500000,
    "nscanned" : 500000,
    "nscannedObjects" : 500000,
    "timems" : 694,
    "cursor" : "BasicCursor"
},

这不是我们所期望的（“nscannedObjects”：500000），但是，没有产品索引，让我们创建一个：

db.random.createIndex({product: 1, number: -1})

方向没有区别，product: 1, number -1 OR product -1, number 1, OR product: 1, number: 1 的任何组合给出相同的行为。我检查了所有组合。

db.runCommand({distinct: 'random', key:'number', query: {product: "value_uniq_1"}})

"stats" : {
    "n" : 500000,
    "nscanned" : 500000,
    "nscannedObjects" : 500000,
    "timems" : 968,
    "cursor" : "BtreeCursor product_1_number_-1"
},

WTF正在进行吗？为什么它用索引扫描所有集合？目前，整个系列只包含一个产品价值，我无法猜测不同的产品会是什么。为什么常见的不同查询这么慢？1秒太慢了。。。

我不想为每一个都使用单独的集合，product这太疯狂且效率低下，因为我需要在所有产品之间共享查询。我的真实数据库每个产品包含超过 500 万个数字，此查询需要 3 秒以上。

score 1 · Accepted Answer

我正在使用 3.0.2，看起来，它利用了索引，但仍然不知道为什么它扫描所有记录，我在我的 mongodb 中创建了相同的集合，并创建了索引。查询“number”字段的不同值表明它扫描了 20K recores（这是我插入的记录总数）

请参阅此图像，其中显示了计划摘要中的索引扫描。

https://www.dropbox.com/s/dh3tglyg4lsaqmm/distinct_explain_plan.png?dl=0

> db.random.getIndexes()
[
    {
            "v" : 1,
            "key" : {
                    "_id" : 1
            },
            "name" : "_id_",
            "ns" : "test.random"
    },
    {
            "v" : 1,
            "key" : {
                    "product" : 1,
                    "number" : 1
            },
            "name" : "product_1_number_1",
            "ns" : "test.random"
    },
    {
            "v" : 1,
            "key" : {
                    "number" : 1
            },
            "name" : "number_1",
            "ns" : "test.random"
    }
]

mongodb - MongoDB 不使用索引进行简单查询

1 回答 1

Related

Reference