performance - 在 MongoDB 中获取文档子集的效果如何？

Question

如果我们有一个集合photos并且每个条目都是一个大文档，其中包含有关照片的所有信息，包括查看详细信息和详细的赞成/反对票。

{
_id:ObjectId('...'),
title:'...',
location:'...',
views:[
    {...},
    {...},
    ...
    ],
upvotes:[
    {...},
    {...},
    ...
    ],
downvotes:[
    {...},
    {...},
    ...
    ],
}

哪个查询会运行得更快、更有效（内存、CPU 使用率）：

db.photos.find().limit(100)

或者

db.photos.find({}, {views:0,upvotes:0,downvotes:0}).limit(100)

?

score 5 · Accepted Answer

这个故事实际上有两个方面，应用程序和服务器。

在应用程序中，第二个会更快。应用程序不必反序列化 BSON 文档（CPU 密集型），然后存储不需要数据的散列（内存密集型）。

在服务器上，MongoDB 可以在线发送更多数据，从而允许每个游标进行更多迭代，然后您必须执行getMore操作，从而提高这方面的性能。不仅如此，您当然会发送更少的数据。对于getMore内存和 CPU 而言，操作本身实际上是资源密集型的，因此这是一种节省。

至于在服务器本身内，投影的成本很小，但它会比把它全部带来的成本要小。

编辑

正如其他人所说，MongoDB 实际上使用投影来操作结果集，因此您将在两个查询之间拥有相同的工作集。

编辑

这是投影索引使用的结果：

> db.g.insert({a:1,b:1,c:1,d:1})
> db.g.ensureIndex({ a:1,b:1,c:1 })
> db.g.find({}, {a:0,b:0,c:0}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {

        }
}
> db.g.find({}, {a:1,b:1,c:1}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {

        }
}

这也是不使用投影的结果：

> db.g.find({}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {

        }
}

如您所见，milis在文档上花费的时间实际上在两者之间是相同的：0. 所以解释不是衡量这一点的好方法。

另一个编辑

排除 _id 不会应用覆盖索引：

> db.g.find({}, {a:1,b:1,c:1,_id:0}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {

        }
}

又一个编辑

并且有 300K 行：

> db.g.find({}, {a:1,b:1,c:1}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 300003,
        "nscannedObjects" : 300003,
        "n" : 300003,
        "millis" : 95,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {

        }
}

> db.g.find({}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 300003,
        "nscannedObjects" : 300003,
        "n" : 300003,
        "millis" : 85,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {

        }
}

因此，在庞大的结果集上进行投影确实成本更高，但请记住，在 300K 行上进行投影……我的意思是 WTF？谁会，在他们正确的头脑中，会这样做？所以这部分论点实际上并不存在。无论哪种方式，差异就像我的硬件上的 10 毫秒，几乎只有您查询的 1/10，因为这种投影在这里不是您的问题。

我还应该注意，该--cpu标志不会给您想要的东西，对于初学者来说，它实际上与写锁定有关，其次是您进行读取。

score 3 · Accepted Answer

你可以自己做。只需explain()在查询末尾添加。

例如：

db.photos.find().limit(100).explain()


{
  "cursor" : "<Cursor Type and Index>",
  "isMultiKey" : <boolean>,
  "n" : <num>,
  "nscannedObjects" : <num>,
  "nscanned" : <num>,
  "nscannedObjectsAllPlans" : <num>,
  "nscannedAllPlans" : <num>,
  "scanAndOrder" : <boolean>,
  "indexOnly" : <boolean>,
  "nYields" : <num>,
  "nChunkSkips" : <num>,
  "millis" : <num>,
  "indexBounds" : { <index bounds> },
  "allPlans" : [
                 { "cursor" : "<Cursor Type and Index>",
                   "n" : <num>,
                   "nscannedObjects" : <num>,
                   "nscanned" : <num>,
                   "indexBounds" : { <index bounds> }
                 },
                  ...
               ],
  "oldPlan" : {
                "cursor" : "<Cursor Type and Index>",
                "indexBounds" : { <index bounds> }
              }
  "server" : "<host:port>",
}

Mills 参数是你想要的

如果您想查看 cpu 使用情况，只需在启动脚本中添加--cpu 密钥。mongod

--cpu
Forces mongod to report the percentage of CPU time in write lock. mongod generates output every four seconds. MongoDB writes this data to standard output or the logfile if using the logpath option.

http://docs.mongodb.org/manual/reference/explain/

你可以像这样hint()向 mongo提供projection()：

我们有简单的集合：

> db.performance.findOne()
{
        "_id" : ObjectId("50d2e4c08861fdb7e1c601ea"),
        "a" : 1,
        "b" : 1,
        "c" : 1,
        "d" : 1
}

其中包含 23 个元素：

> db.performance.count()
23

现在我们可以创建复合索引：

> db.performance.ensureIndex({'c':1, 'd':1})

并为 mongo 提供使用 index 进行投影的提示。

> db.performance.find({'a':1}, {'c':1, 'd':1}).hint({'c':1, 'd':1}).explain()
{
        "cursor" : "BtreeCursor c_1_d_1",
        "isMultiKey" : false,
        "n" : 1,
        "nscannedObjects" : 23,
        "nscanned" : 23,
        "nscannedObjectsAllPlans" : 23,
        "nscannedAllPlans" : 23,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "c" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "d" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "server" : ""
}
>

performance - 在 MongoDB 中获取文档子集的效果如何？

2 回答 2

编辑

编辑

另一个编辑

又一个编辑

Related

Reference