1

First time here - please go easy… ;)

I'm starting off with MongoDB for the first time - using the offical PHP driver to interact with an application. Here's the first problem I've ran into with regards to the aggregation framework. I have a collection of documents, all of which contain an array of numbers, like in the following shortened example...

{
  "_id": ObjectId("51c42c1218ef9de420000002"),
  "my_id": 1,
  "numbers": [
    482,
    49,
    382,
    290,
    31,
    126,
    997,
    20,
    145
  ],

}

{
  "_id": ObjectId("51c42c1218ef9de420000006"),
  "my_id": 2,
  "numbers": [
    19,
    234,
    28,
    962,
    24,
    12,
    8,
    643,
    145
  ],

}

{
  "_id": ObjectId("51c42c1218ef9de420000008"),
  "my_id": 3,
  "numbers": [
    912,
    18,
    456,
    34,
    284,
    556,
    95,
    125,
    579
  ],

}

{
  "_id": ObjectId("51c42c1218ef9de420000012"),
  "my_id": 4,
  "numbers": [
    12,
    97,
    227,
    872,
    103,
    78,
    16,
    377,
    20
  ],

}

{
  "_id": ObjectId("51c42c1218ef9de420000016"),
  "my_id": 5,
  "numbers": [
    212,
    237,
    103,
    93,
    55,
    183,
    193,
    17,
    346
  ],

}

Using the aggregation framework and PHP (which I think is the correct way), I'm trying to work out the average amount of times a number doesn't appear in a collection (within the numbers array) before it appears again. For example, the average amount of times the number 20 doesn't appear in the above example is 1.5 (there's a gap of 2 collections, followed by a gap of 1 - add these values together, divide by number of gaps). I can get as far as working out if the number 20 is within the results array, and then using the $cond operator, passing a value based on the result. Here’s my PHP…</p>

$unwind_results = array(
    '$unwind' => '$numbers'
);

$project = array (
    '$project' => array(
        'my_id' => '$my_id',
        'numbers' => '$numbers',
        'hit' => array('$cond' => array(
            array(
                '$eq' => array('$numbers',20)
                 ),
            0,
            1
            )
        )
    )
);

$group = array (
    '$group' => array(
        '_id' => '$my_id',
        'hit' => array('$min'=>'$hit'),
    )
);

$sort = array(
    '$sort' => array( '_id' => 1 ),
);


$avg = $c->aggregate(array($unwind_results,$project, $group,  $sort));

What I was trying to achieve, was to setup up some kind of incremental counter that reset everytime the number 20 appeared in the numbers array, and then grab all of those numbers and work out the average from there…But im truly stumped.

I know I could work out the average from a collection of documents on the application side, but ideally I’d like Mongo to give me the result I want so it’s more portable.

Would Map/Reduce need to get involved somewhere?

Any help/advice/pointers greatly received!

4

1 回答 1

1

正如 Asya 所说,聚合框架不适用于问题的最后一部分(管道中文档之间“命中”的平均差距)。Map/reduce 似乎也不太适合这项任务,因为您需要按顺序(并按排序顺序)处理文档以进行此计算,而 MR 强调并行处理。

鉴于聚合框架确实按排序顺序处理文档,我昨天在集思广益,讨论它如何支持您的用例。如果$group在投影期间暴露了对其累加器值的访问(除了正在处理的文档之外),我们可能能够使用$push在投影数组中收集以前的值,然后在投影期间检查它们以计算这些“命中”间隙。或者,如果有一些工具可以访问$group我们的存储桶遇到的先前文档(即组密钥),这可以让我们确定差异并计算间隙跨度。

我与从事框架工作的 Mathias 分享了这些想法,他解释说,虽然所有这些对于单个服务器来说都是可能的(如果实现了功能),但它在分片基础设施上根本不起作用,在哪里$group$sort操作是分散式。这不是一个便携式的解决方案。

我认为您最好的选择是使用$project您拥有的运行聚合,然后用您的应用程序语言处理这些结果。

于 2013-06-26T18:02:36.763 回答