mongodb - MongoDB 聚合中的多个 $project 阶段是否会影响性能

Question

TL;博士

我们在stage和stage$project之间添加stage是为了过滤掉不必要的数据或对字段进行别名。这些stage在调试时提高了查询的读取能力，但是当每个集合中有大量文档时它们会以任何方式影响性能参与查询。$match$lookup$project

问题详细

例如我有两个收藏学校和学生，如下所示：

是的，我知道架构设计很糟糕！MongoDB 说 - 将所有内容放在同一个集合中以避免关系，但现在让我们继续使用这种方法。

学校收藏

{
    "_id": ObjectId("5c04dca4289c601a393d9db8"),
    "name": "First School Name",
    "address": "1 xyz",
    "status": 1,
    // Many more fields
},
{
    "_id": ObjectId("5c04dca4289c601a393d9db9"),
    "name": "Second School Name",
    "address": "2 xyz",
    "status": 1,
    // Many more fields
},
// Many more Schools

学生收藏

{
    "_id": ObjectId("5c04dcd5289c601a393d9dbb"),
    "name": "One Student Name",
    "school_id": ObjectId("5c04dca4289c601a393d9db8"),
    "address": "1 abc",
    "Gender": "Male",
    // Many more fields
},
{
    "_id": ObjectId("5c04dcd5289c601a393d9dbc"),
    "name": "Second Student Name",
    "school_id": ObjectId("5c04dca4289c601a393d9db9"),
    "address": "1 abc",
    "Gender": "Male",
    // Many more fields
},
// Many more students

现在在我的查询中，如下所示，我在之前有一个$project阶段。那么这个阶段有必要吗？当查询涉及的所有集合中有大量文档时，这个阶段会影响性能吗？$match$lookup$project

db.students.aggregate([
    {
        $match: {
            "Gender": "Male"
        }
    },
    // 1. Below $project stage is not necessary apart from filtering out and aliasing.
    // 2. Will this stage affect performance when there are huge number of documents?
    {
        $project: {
            "_id": 0,
            "student_id": "$_id",
            "student_name": "$name",
            "school_id": 1
        }
    },
    {
        $lookup: {
            from: "schools",
            let: {
                "school_id": "$school_id"
            },
            pipeline: [
                {
                    $match: {
                        "status": 1,
                        $expr: {
                            $eq: ["$_id", "$$school_id"]
                        }
                    }
                },
                {
                    $project: {
                        "_id": 0,
                        "name": 1
                    }
                }
            ],
            as: "school"
        }
    },
    {
        $unwind: "$school"
    }
]);

score 2 · Accepted Answer

读一读：https ://docs.mongodb.com/v3.2/core/aggregation-pipeline-optimization/

与您的特定情况相关的是 The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.

因此，在幕后进行了一些优化。您可以尝试在您的聚合中添加解释选项，以准确了解 mongo 正在做什么来尝试优化您的管道。

我认为您正在做的事情实际上应该有助于提高性能，因为您正在减少流过的数据量。

mongodb - MongoDB 聚合中的多个 $project 阶段是否会影响性能

1 回答 1

Related

Reference