4

我试图理解 map/reduce 的行为。

这是地图功能:

function() {
  var klass = this.error_class;
  emit('klass', { model : klass, count : 1 });
}

和减少功能:

function(key, values) {
  var results = { count : 0, klass: { foo: 'bar' } };
  values.forEach(function(value) {
    results.count += value.count;
    results.klass[value.model] = 0;
    printjson(results);
  });
  return results;
}

然后我运行它:

{
  "count" : 85,
  "klass" : {
    "foo" : "bar",
    "Twitter::Error::BadRequest" : 0
  }
}
{
  "count" : 86,
  "klass" : {
    "foo" : "bar",
    "Twitter::Error::BadRequest" : 0,
    "Stream:DirectMessage" : 0
  }
}

在这一点上,一切都很好,但是每 100 个文档产生读锁:

{
  "count" : 100,
  "klass" : {
    "foo" : "bar",
    "Twitter::Error::BadRequest" : 0,
    "Stream:DirectMessage" : 0
  }
}
{ "count" : 100, "klass" : { "foo" : "bar", "undefined" : 0 } }

我保留了我的密钥foo,我的count属性不断增加。问题是其他一切都变成了undefined.

那么为什么在我的count属性仍然很好的情况下丢失了对象的动态键呢?

4

1 回答 1

0

A thing to remember about your reduce function is that the values passed to it are either the output of your map function, or the return value of previous calls to reduce.

This is key - it means mapping / reducing of parts of the data can be farmed off to different machines (eg different shards of a mongo cluster) and then reduce used again to reassemble the data. It also means that mongo doesn't have to first map every value, keeping all the results in memory and then reduce them all: it can map and reduce in chunks, re-reducing where necessary.

In other words the following must be true:

reduce(k,[A,B,C]) == reduce(k, [A,  reduce(k,[A,B]))

Your reduce function's output doesn't have a model property so if it gets used in a re-reduce those undefined values will crop up.

You either need to have your reduce function return something similar in format to what your map function emits so that you can process the two without distinction(usually the easiest) or else handle re-reduced values differently.

于 2012-09-21T17:29:46.437 回答