27

我想使用 mongoDB 的聚合框架来运行 SQL 中看起来有点像的东西:

SELECT SUM(A), B, C from myTable GROUP BY B, C;

文档状态:

您可以指定管道中文档中的单个字段、先前计算的值或由多个传入字段组成的聚合键。

但目前还不清楚“由几个传入字段组成的聚合键”实际上是什么?

我的数据集有点像这样:

[{ "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"},
{ "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"},
{ "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"},
{ "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"},
{ "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"},
{ "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}]

我的查询如下所示:

resultsCollection.aggregate(
    { $match : { testid : testid} },
    { $skip : alreadyRead },
    { $project : {
            timeStamp : 1 ,
            label : 1,
            responseCode : 1 ,
            value : 1,
            success : 1
        }},
    { $group : {
            _id : "$label",
            max_timeStamp : { $timeStamp : 1 },
            count_responseCode : { $sum : 1 },
            avg_value : { $sum : "$value" },
            count_success : { $sum : 1 }
        }},
    { $group : {
            ?
        }}
);

我的直觉是尝试将结果传递给第二组,我知道您可以这样做,但它不会起作用,因为第一组已经过多地减少了数据集并且丢失了所需的细节级别。

我想要做的是分组使用labelresponseCodesuccess从结果中获取值的总和。它应该看起来有点像:

label   | code | success | sum_of_values | count
sharon  | 200  |  true   |      10       |   1
sharon  | 200  |  false  |      35       |   1
paul    | 200  |  true   |      100      |   2
paul    | 404  |  true   |      15       |   1
paul    | 404  |  false  |      99       |   1

其中有五组:

1. { "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"}

2. { "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"}

3. { "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"}
   { "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"}

4. { "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"}

5. { "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}
4

1 回答 1

38

好的,所以解决方案是为 _id 值指定一个聚合键。这在此处记录为:

您可以指定管道中文档中的单个字段、先前计算的值或由多个传入字段组成的聚合键。

但它实际上并没有定义聚合键的格式。阅读这里的早期文档,我看到以前的 collection.group 方法可以采用多个字段,并且在新框架中使用了相同的结构。

因此,要对多个字段进行分组,您可以使用_id : { success:'$success', responseCode:'$responseCode', label:'$label'}

如:

resultsCollection.aggregate(
{ $match : { testid : testid} },
{ $skip : alreadyRead },
{ $project : {
        timeStamp : 1 ,
        label : 1,
        responseCode : 1 ,
        value : 1,
        success : 1
    }},
{ $group : {
        _id :  { success:'$success', responseCode:'$responseCode', label:'$label'},
        max_timeStamp : { $timeStamp : 1 },
        count_responseCode : { $sum : 1 },
        avg_value : { $sum : "$value" },
        count_success : { $sum : 1 }
    }}
);
于 2012-07-10T22:33:37.363 回答