我不知道如何表达这一点,但基本上我想按子数组中的字段对文档进行分组,然后我想按父(根)文档中的字段进行分组,但保留先前的分组。
我希望一个例子在这里有所帮助。
假设我有这些文件,其中有关几个文件的信息custItemNum
几乎按以下方式分组originalFile
:
[
{
"items" : [
{
"recType" : "I2",
"qty" : 2.0,
"custItemNum" : 10.0
},
{
"recType" : "I2",
"qty" : 200.0,
"custItemNum" : 20.0
},
{
"recType" : "I2",
"qty" : 50.0,
"custItemNum" : 30.0
},
{
"recType" : "D9",
"custItemNum" : 10.0
},
{
"recType" : "D9",
"custItemNum" : 20.0
},
{
"recType" : "D9",
"custItemNum" : 30.0
}
],
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"items" : [
{
"recType" : "I2",
"qty" : 180.0,
"custItemNum" : 20.0
}
],
"originalFile" : "727557371.txt",
"docId" : "278791399"
},
{
"items" : [
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
}
],
"originalFile" : "727557371.txt",
"docId" : "278791399"
},
{
"items" : [
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
}
],
"originalFile" : "727557371.txt",
"docId" : "278791399"
}
]
我想最终得到一个这样的集合,其中第一个分组是 by custItemNumber
,然后是 by originalFile
:
[
{
"custItemNumber" : 10.0,
"count" : 2.0,
"itemInfo" : [
{
"originalFile" : "727451921.txt",
"item" : [
{
"recType" : "I2",
"qty" : 2.0,
"custItemNum" : 10.0
},
{
"recType" : "D9",
"custItemNum" : 10.0
}
]
}
]
},
{
"custItemNumber" : 20.0,
"count" : 3.0,
"itemInfo" : [
{
"originalFile" : "727451921.txt",
"item" : [
{
"recType" : "I2",
"qty" : 200.0,
"custItemNum" : 20.0
},
{
"recType" : "D9",
"custItemNum" : 20.0
}
]
},
{
"originalFile" : "727557371.txt",
"item" : [
{
"recType" : "I2",
"qty" : 180.0,
"custItemNum" : 20.0
}
]
}
]
},
{
"custItemNumber" : 30.0,
"count" : 4.0,
"itemInfo" : [
{
"originalFile" : "727451921.txt",
"item" : [
{
"recType" : "I2",
"qty" : 50.0,
"custItemNum" : 30.0
},
{
"recType" : "D9",
"custItemNum" : 30.0
}
]
},
{
"originalFile" : "727557371.txt",
"item" : [
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
},
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
}
]
}
]
}
]
请记住,这些文档已经来自多个聚合步骤,因此没有可用_id
的字段。
到目前为止,我想出了这些聚合阶段(我手动编辑了它的输出以获得上面的结果):
{$unwind: "$items"},
{$bucket: {
groupBy: "$items.custItemNum",
boundaries: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
output: {
count: {$sum: 1},
itemInfo: {$push: "$$ROOT"}
}
}}
这导致了这个结果:
[
{
"_id" : 10.0,
"count" : 2.0,
"itemInfo" : [
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "I2",
"qty" : 2.0,
"custItemNum" : 10.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "D9",
"custItemNum" : 10.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
}
]
},
{
"_id" : 20.0,
"count" : 3.0,
"itemInfo" : [
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "I2",
"qty" : 200.0,
"custItemNum" : 20.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "D9",
"custItemNum" : 20.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae5290"),
"items" : {
"recType" : "I2",
"qty" : 180.0,
"custItemNum" : 20.0
},
"originalFile" : "727557371.txt",
"docId" : "278791399"
}
]
},
{
"_id" : 30.0,
"count" : 4.0,
"itemInfo" : [
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "I2",
"qty" : 50.0,
"custItemNum" : 30.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "D9",
"custItemNum" : 30.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae5291"),
"items" : {
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
},
"originalFile" : "727557371.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae5292"),
"items" : {
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
},
"originalFile" : "727557371.txt",
"docId" : "278791399"
}
]
}
]
我被困在这里,想到的任何其他步骤(即 a $replaceRoot : { newRoot: "$itemInfo" }
)都会破坏外部分组。
另外,这些custItemNum
值是动态的,但是 AFAICT 舞台的boundaries
字段$bucket
采用一个常量数组,所以如果有办法在那里传递一个计算数组,我想知道如何。