0

我对 MongoDB 中的 mapReduce 和聚合也有点陌生。

这是数据集的示例:

{ "_id" : ObjectId("521002161e0787522098d110"), "userId" : 4545454, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("521002481e0787522098d111"), "userId" : 64545454, "pickId" : 1, "answerArray" : [  "no" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("521002871e0787522098d112"), "userId" : 78263636, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "Albany", "state" : "New York" }
{ "_id" : ObjectId("5211507c1e0787522098d113"), "userId" : 78263636, "pickId" : 2, "answerArray" : [  "yes" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("5211507c1e0787522098d113"), "userId" : 78263636, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "Wichita", "state" : "Kansas" }

我正在寻找州、城市、pickId、answerArray 的唯一值列表,然后计算这些唯一组合。结果需要如下所示:

{"pickId": 1, "city": "New York", "state": "New York", "answerArray": ["yes"], "count":2}
{"pickId": 1, "city": "Albany", "state": "New York", "answerArray": ["no"], "count":1}
{"pickId": 1, "city": "New York", "state": "New York", "answerArray": ["no"], "count":1}
{"pickId": 1, "city": "Wichita", "state": "Kansas", "answerArray": ["yes"], "count":1}

我遇到的问题是 mapReduce 只需要两个参数:

Error: fast_emit takes 2 args near...

但我希望将多个唯一值映射到一个pickId。

这是我正在查看的 mapReduce 中的代码:

var mapFunct = function() {
if(this.answerArray == "yes"){
emit(this.pickId,1);}
else{
emit(this.pickId,0);};}

var mapReduce2 = function(keyPickId,answerVals){ 
return Array.sum(answerVals);};

db.answers.mapReduce( mapFunct, mapReduce2, { out: "mapReduceAnswers"})

任何帮助或进一步的建议将不胜感激。我也研究过聚合框架,但似乎我不会得到我需要的那种输出。

4

1 回答 1

0

我认为您可以使用聚合获得所需的格式,特别是$groupand$project运算符。看看这个聚合调用:

var agg_output = db.answers.aggregate([
  { $group: { _id: {
                city: "$city",
                state: "$state",
                answerArray: "$answerArray",
                pickId: "$pickId"
            }, count: { $sum: 1 }}
  },
  { $project: { city: "$_id.city", 
                state: "$_id.state", 
                answerArray: "$_id.answerArray", 
                pickId: "$_id.pickId", 
                count: "$count", 
                _id: 0}
  }
]);

db.answer_counts.insert(agg_output.result);

$group阶段负责汇总城市/州/answerArray/pickId 的每个唯一组合的出现,而该$project阶段将数据放入您想要的形式。

insert调用将结果输出保存到新集合中。那有意义吗?

于 2013-09-20T15:11:26.277 回答