mongodb - 在聚合框架中使用文本搜索时的 MongoDB 索引优化

Question

我们正在 MongoDB 之上构建一个简化版本的搜索引擎。

样本数据集

{ "_id" : 1, "dept" : "tech", "updDate":  ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer" }
{ "_id" : 2, "dept" : "tech", "updDate":  ISODate("2014-07-27T09:45:35Z"), "description" : "wireless red mouse" }
{ "_id" : 3, "dept" : "kitchen", "updDate":  ISODate("2014-04-27T09:45:35Z"), "description" : "green placemat" }
{ "_id" : 4, "dept" : "kitchen", "updDate":  ISODate("2014-05-27T09:45:35Z"), "description" : "red peeler" }
{ "_id" : 5, "dept" : "food", "updDate":  ISODate("2014-04-27T09:45:35Z"), "description" : "green apple" }
{ "_id" : 6, "dept" : "food", "updDate":  ISODate("2014-01-27T09:45:35Z"), "description" : "red potato" }
{ "_id" : 7, "dept" : "food", "updDate":  ISODate("2014-08-28T09:45:35Z"), "description" : "lime green computer" }
{ "_id" : 8, "dept" : "food", "updDate":  ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer" }
{ "_id" : 9, "dept" : "food", "updDate":  ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer" }

我们希望避免使用“offset-limit”对结果进行分页，为了做到这一点，我们基本上是通过修改查询的“where/match”子句来使用“seek 方法”，以便能够使用索引而不是遍历集合以获取所需的结果。有关“寻求方法”的更多信息，我强烈建议您阅读http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way

搜索引擎通常按分数排序结果，并按后代顺序更新日期。为此，我们在聚合管道中使用文本搜索功能，如下所示。

db.inventory.createIndex({description:"text", dept: -1, updDate: -1, id:-1})

第一页

db.inventory.aggregate(  [ { $match: { dept : {$in : ["food","kitchen"]},"$text" : { "$language" : "en", "$search" : "green"} } },{ $project: {score: { $meta: "textScore" }, description : 1, updDate : 1, _id: 1 } }, { $sort: { "score" : -1, "updDate" : -1, _id: -1 } }, {$limit:  2 }]  )


{ "_id" : 5, "updDate" : ISODate("2014-04-27T09:45:35Z"), "description" : "green apple", "score" : 0.75 }
{ "_id" : 3, "updDate" : ISODate("2014-04-27T09:45:35Z"), "description" : "green placemat", "score" : 0.75 }

第二页

db.inventory.aggregate(  [ { $match: { dept : {$in : ["food","kitchen"]},"$text" : { "$language" : "en", "$search" : "green"} } },{ $project: {score: { $meta: "textScore" }, description : 1, updDate : 1, _id: 1 } }, { $sort: { "score" : -1, "updDate" : -1, _id: -1 } }, { "$match" : { "$or" : [ { "score" : { "$lt" : 0.75}} , { "$and" : [ { "score" : { "$eq" : 0.75}} , { "$or" : [ { "updDate" : { "$lt" : ISODate("2014-04-27T09:45:35Z")}},{ "$and" : [ { "updDate": { "$eq" : ISODate("2014-04-27T09:45:35Z")}} , { "_id" : { "$lt" : 3}}]}]}]}]}},{$limit:  2 }]  )

{ "_id" : 7, "updDate" : ISODate("2014-08-28T09:45:35Z"), "description" : "lime green computer", "score" : 0.6666666666666666 }
{ "_id" : 9, "updDate" : ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer", "score" : 0.6666666666666666 }

最后一页

db.inventory.aggregate(  [ { $match: { dept : {$in : ["food","kitchen"]} , "$text" : { "$language" : "en", "$search" : "green"} } }, { $project: {score: { $meta: "textScore" }, description : 1, updDate : 1, _id: 1 } }, { $sort: { "score" : -1, "updDate" : -1, _id: -1 } }, { "$match" : { "$or" : [ { "score" : { "$lt" : 0.6666666666666666}} , { "$and" : [ { "score" : { "$eq" : 0.6666666666666666}} , { "$or" : [ { "updDate" : { "$lt" : ISODate("2014-08-27T09:45:35Z")}} , { "$and" : [ { "updDate" : { "$eq" : ISODate("2014-08-27T09:45:35Z")}} , { "_id" : { "$lt" : 9}}]}]}]}]}}, {$limit:  2 }]  )


{ "_id" : 8, "updDate" : ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer", "score" : 0.6666666666666666 }

请注意我们如何按分数、updDate 和 id 对结果进行排序，以及在第二个匹配阶段我们如何尝试使用文档的分数值、更新日期和最后的 id 对它们进行分页。

索引创建考虑到文本索引前缀字段不能涵盖文本查询，请参阅问题https://jira.mongodb.org/browse/SERVER-13018，尽管我不确定这是否适用于我们的案例。

由于“executionStats”和“allPlansExecution”模式在聚合框架中不起作用，请参阅https://jira.mongodb.org/browse/SERVER-19758我不知道 MongoDB 如何尝试解析查询。

由于索引交集不适用于文本搜索，请参阅https://jira.mongodb.org/browse/SERVER-3071（在 2.5.5 解决）和http://blog.mongodb.org/post/87790974798/efficient -indexing-in-mongodb-26作者所说的

As of version 2.6.0, you cannot intersect with geo or text indices and you can intersect at most 2 separate indices with each other. These limitations are likely to change in a future release.

在阅读了https://docs.mongodb.org/manual/MongoDB-indexes-guide-master.pdf的 3.4 节（文本搜索教程）和 3.5 节（索引策略）后，没有得出任何明确的结论。

那么从文本搜索的角度来看，对该集合进行索引的最佳索引策略是什么？

第一个匹配阶段的一个索引和第二个（分页）匹配阶段的另一个索引？

db.inventory.createIndex({description:"text", dept: -1})
db.inventory.createIndex({updDate: -1, id:-})

考虑到两个匹配阶段的字段的复合索引？

db.inventory.createIndex({description:"text", dept: -1, updDate: -1, id:-1})

以上都不是？

谢谢

mongodb - 在聚合框架中使用文本搜索时的 MongoDB 索引优化

0 回答 0

Related

Reference