3

I'm currently logging power measurements (watts) at varying intervals (between 1 and 5 seconds) to my MongoDB 2.2 (db -> monitoring -> kWh). The data within my collection is packaged as below.

{
   "_id":ObjectId("5060c134f05e888e03000001"),
   "reading":"power",
   "watts":"549.",
   "datetime":1348518196
}

I need to aggregate the information to an hourly basis, therefore sum all the watts from the start of an hour to the end and divide by the number of readings during the hour. I need to be able to push this result to a new collection within MongoDB by means of PHP. This could of course be run as a cron job, but is there is a mechanism to perform this as part of an insert?

The datetime field is a Unix timestamp.

4

1 回答 1

1

您可以使用 MR 轻松地做到这一点,其具有如下发射功能:

function(){
    emit(hour, {count: this.watts});
}

变量将hour是处理行的时间的标准化小时(如下面的 PHP 片段所示),使用如下所示的方法:在 javascript 中将日期转换为时间戳?或者您可以只从mktime().

做一个非常简单的 reduce 来总结它们,并对从调用 MR 的 PHP cronjob 运行的主要每小时聚合集合执行 a of outmerge

然而,这对于这类事情来说似乎有点矫枉过正,我个人会直接在 PHP 中这样做:

$cursor = $db->collection->find(array('datetime' => array('$gte' => time()-3600)));
$sumWatts = 0;
foreach($cursor as $_id => $row){
   $sumWatts += $row['watts'];
}
$db->otherCollection->insert(array('sum' => $sumWatts, 'hour' => mktime(date('H'), 0, 0))));

这会将所有行的小时标准化为处理它的整小时。

尽管您也可以使用聚合框架使用$sum运算符来完成此操作,将其读入 PHP,然后将其写出。

但是,我认为对于这种特殊类型的聚合,直接 PHP 在这段时间跨度内对于这个单一字段可能更简单、更容易,甚至可能更快。

如果您要汇总大量数据和许多字段,那么我会说在可以超时运行的 MR 中执行此操作等。

于 2012-10-03T10:01:51.693 回答