由于您没有提供示例文档(对象)格式,因此将其作为一个名为'stories'的示例集合。
{ "_id" : ObjectId("4eafd693627b738f69f8f1e3"), "body" : "There was a king", "author" : "tom" }
{ "_id" : ObjectId("4eafd69c627b738f69f8f1e4"), "body" : "There was a queen", "author" : "tom" }
{ "_id" : ObjectId("4eafd72c627b738f69f8f1e5"), "body" : "There was a queen", "author" : "tom" }
{ "_id" : ObjectId("4eafd74e627b738f69f8f1e6"), "body" : "There was a jack", "author" : "tom" }
{ "_id" : ObjectId("4eafd785627b738f69f8f1e7"), "body" : "There was a humpty and dumpty . Humtpy was tall . Dumpty was short .", "author" : "jane" }
{ "_id" : ObjectId("4eafd7cc627b738f69f8f1e8"), "body" : "There was a cat called Mini . Mini was clever cat . ", "author" : "jane" }
对于给定的数据集,您可以使用以下 javascript 代码来获取您的解决方案。集合“ authors_unigrams ”包含结果。所有代码都应该使用 mongo 控制台(http://www.mongodb.org/display/DOCS/mongo+-+The+Interactive+Shell)运行。
首先,我们需要标记所有重新进入“故事”集合的新文档。我们使用以下命令执行此操作。它将在每个文档中添加一个名为“mr_status”的新属性并分配值“inprocess”。稍后,我们将看到 map-reduce 操作只会考虑那些字段“mr_status”的值为“inprocess”的文档。这样,我们可以避免重新考虑在之前的任何尝试中已经考虑过的所有用于 map-reduce 操作的文档,从而使操作按要求高效。
db.stories.update({mr_status:{$exists:false}},{$set:{mr_status:"inprocess"}},false,true);
其次,我们定义了map()和reduce()函数。
var map = function() {
uniqueWords = function (words){
var arrWords = words.split(" ");
var arrNewWords = [];
var seenWords = {};
for(var i=0;i<arrWords.length;i++) {
if (!seenWords[arrWords[i]]) {
seenWords[arrWords[i]]=true;
arrNewWords.push(arrWords[i]);
}
}
return arrNewWords;
}
var unigrams = uniqueWords(this.body) ;
emit(this.author, {unigrams:unigrams});
};
var reduce = function(key,values){
Array.prototype.uniqueMerge = function( a ) {
for ( var nonDuplicates = [], i = 0, l = a.length; i<l; ++i ) {
if ( this.indexOf( a[i] ) === -1 ) {
nonDuplicates.push( a[i] );
}
}
return this.concat( nonDuplicates )
};
unigrams = [];
values.forEach(function(i){
unigrams = unigrams.uniqueMerge(i.unigrams);
});
return { unigrams:unigrams};
};
第三,我们实际运行 map-reduce 函数。
var result = db.stories.mapReduce( map,
reduce,
{query:{author:{$exists:true},mr_status:"inprocess"},
out: {reduce:"authors_unigrams"}
});
Fourth, we mark all the records that have been considered for map-reduce in last run as processed by setting "mr_status" as "processed".
db.stories.update({mr_status:"inprocess"},{$set:{mr_status:"processed"}},false,true);
Optionally, you can see the result collection "authors_unigrams" by firing following command.
db.authors_unigrams.find();