mongodb - 针对计数器字段构建索引

Question

对于用作计数器的字段，即值将随着时间而改变，并将用于返回有序实体（将针对过滤实体对该字段进行排序），我们是否应该为该字段建立索引？

score 4 · Accepted Answer

这并不完全清楚，但我认为问题在于在频繁更新的字段上创建索引的缺点是否会超过在该字段上快速查询和排序的好处。您还暗示您的查询将过滤不同的字段，然后您想对该字段进行排序。随意详细说明您的确切用例。

我认为你想要的是这样的：

db.test.save({filter: "stuff", count: "1"});
db.test.save({filter: "stuff", count: "3"});
db.test.save({filter: "stuff", count: "2"});
db.test.save({filter: "notstuff", count: "2"});
db.test.save({filter: "notstuff", count: "2"});

然后是这样的索引：

db.test.ensureIndex({filter:1, count:1});

然后是这样的查询：

db.test.find({filter:"stuff"}).sort({count:1});
{ "_id" : ObjectId("4f24353eef88b8b53a20fdf5"), "filter" : "stuff", "count" : "1" }
{ "_id" : ObjectId("4f24353eef88b8b53a20fdf7"), "filter" : "stuff", "count" : "2" }
{ "_id" : ObjectId("4f24353eef88b8b53a20fdf6"), "filter" : "stuff", "count" : "3" }

哪个使用 btree：

db.test.find({filter:"stuff"}).sort({count:1}).explain();
{
"cursor" : "BtreeCursor filter_1_count_1",
"nscanned" : 3,
"nscannedObjects" : 3,
...

现在，这实际上可能取决于您需要返回多少结果。如果只有几个结果，您可能可以在没有索引的情况下对字段进行排序，这将提高更新性能。我想我实际上会做一些测试，因为我很好奇。一会儿我会更新。

更新我写了这个基准来显示在索引上排序和不排序和更新索引上的计数字段之间的区别，而不是。完整代码在这里：https ://gist.github.com/1696041

它插入 700K 和 7M 文档（以获得一些多样性），分成 7 个“过滤器”。然后它随机选择一个文档来增加 1M 次的计数。每个过滤器的 1M 文档太大而无法无限制地排序，因此显示该部分如何工作的唯一方法是设置限制。

结论如预期。当有索引时更新计数字段需要更长的时间（在这个测试中几乎是两倍的时间 - 但两倍的时间仍然非常快）。但是查询起来要快得多。你必须决定哪个对你更重要。

输出在这里（在我的带有 SSD 的 macbook pro 上运行）：

> bench();
benchmarking with index on {filter,data}, 700K docs  
initialInsert of 700000 done in: 58304ms, 0.08329142857142857ms per insert
updateCounts 1000000 times done in: 103915ms, 0.103915ms per update
explain find({filter:"abcd"}).sort({count:-1}): 
   cursor: BtreeCursor filter_1_data_1
   nscanned: 100000
   scanAndOrder: true
   millis: 1235
explain find({filter:"abcd"}).limit(100).sort({count:-1}): 
   cursor: BtreeCursor filter_1_data_1
   nscanned: 100000
   scanAndOrder: true
   millis: 614
benchmarking with index on {filter,data} and {filter, count}, 700k docs
initialInsert of 700000 done in: 72108ms, 0.10301142857142857ms per insert
updateCounts 1000000 times done in: 202778ms, 0.202778ms per update
explain find({filter:"abcd"}).sort({count:-1}): 
   cursor: BtreeCursor filter_1_count_-1
   nscanned: 100000
   scanAndOrder: undefined
   millis: 139
explain find({filter:"abcd"}).limit(100).sort({count:-1}): 
   cursor: BtreeCursor filter_1_count_-1
   nscanned: 100
   scanAndOrder: undefined
   millis: 0
benchmarking with index on {filter,data}, 7M docs
initialInsert of 7000000 done in: 616701ms, 0.08810014285714286ms per insert
updateCounts 1000000 times done in: 134655ms, 0.134655ms per update
explain find({filter:"abcd"}).sort({count:-1}): 
***too big to sort without limit!***
explain find({filter:"abcd"}).limit(100).sort({count:-1}): 
   cursor: BtreeCursor filter_1_data_1
   nscanned: 1000000
   scanAndOrder: true
   millis: 6396
benchmarking with index on {filter,data} and {filter, count}, 7M docs
initialInsert of 7000000 done in: 891556ms, 0.12736514285714284ms per insert
updateCounts 1000000 times done in: 280885ms, 0.280885ms per update
explain find({filter:"abcd"}).sort({count:-1}): 
   cursor: BtreeCursor filter_1_count_-1
   nscanned: 1000000
   scanAndOrder: undefined
   millis: 1337
explain find({filter:"abcd"}).limit(100).sort({count:-1}): 
   cursor: BtreeCursor filter_1_count_-1
   nscanned: 100
   scanAndOrder: undefined
   millis: 0

score 0 · Accepted Answer

奇怪的问题。索引用于高效查询。如果您查询某个字段并且您可能有兴趣创建一个索引。explain() 告诉你执行计划。MongoDB 文档对此进行了深入介绍……那么您为什么要问这样一个非常基本的问题呢？

mongodb - 针对计数器字段构建索引

2 回答 2

Related

Reference