0

最初,像往常一样,我有一个订单有许多订单项而许多订单项只有一个订单的关系。

使用 mongoDB,我做了这个文档来表示它:

{
    "_id" : ObjectId("511b7d1b3daee1b1446ecdfe"),
    "l_order" : {
        "_id" : ObjectId("511b7d133daee1b1446eb54d"),
        "o_orderkey" : NumberLong(1),
        "o_totalprice" : 173665.47,
        "o_orderdate" : ISODate("1996-01-02T03:00:00Z"),
        "o_orderpriority" : "5-LOW",
        "o_shippriority" : 0,
    },
    "l_linenumber" : 1,
    "l_shipdate" : ISODate("1996-03-13T03:00:00Z"),
    "l_commitdate" : ISODate("1996-02-12T03:00:00Z"),
    "l_receiptdate" : ISODate("1996-03-22T03:00:00Z"),
}

我的意图是翻译这个 sql 查询:

select
    o_orderpriority, 
    count(*) as order_count
from 
    orders
where 
    o_orderdate >= date '1993-07-01'
    and o_orderdate < date '1993-07-01' + interval '3' month
    and exists (
        select 
        *
        from 
        lineitem
        where 
        l_orderkey = o_orderkey
        and l_commitdate < l_receiptdate
    )
group by 
    o_orderpriority
order by 
    o_orderpriority;

为此,请使用两个 mapreduce 函数:

第一的

db.runCommand({
    mapreduce: "lineitem",
    query: {
        "l_order.o_orderdate": {'$gte': new Date("July 01, 1993"), '$lt': new Date("Oct 01, 1993")}
    },
    map:    function Map() {
                if(this.l_commitdate < this.l_receiptdate){
                    emit( this.l_order.o_orderkey, this.l_order.o_orderpriority );
                }   
            },
    out: 'query004a'
});

第二

db.runCommand({
    mapreduce: "query004a",
    map:    function Map() {
                /*Remenbering, the value here will be this.l_order.o_orderpriority from the previous mapreduce function*/
                emit( this.value, 1 );
            },
    reduce: function(key, values) {
                return Array.sum(values);
            },
    out: 'query004b'
});

首先,我将日期范围内的文档片段分开并尊重比较,将它们分组为订单键以避免重复。在第二个中,我将 o_orderpriority 和 sum 分组。

令我惊讶的是,答案比我预期的要大。但是为什么以及在哪里发生这种情况?

4

1 回答 1

0

在您的第一个 map 函数中,您应该使用 'oderpriority' 作为键和 'orderkey' 作为值 - 这会将集合减少为您在第二个 mapReduce 中想要的键。(需要指定reduce函数,否则mapReduce会报错)。

所以,这可能看起来像这样:

OrderDateMin = new Date("1996-01-01");
OrderDateMax = new Date("1996-04-01");
// first where on oderdate
query = {
    "l_order.o_orderdate": {$gte: OrderDateMin, $lt: OrderDateMax}
}
map1 = function() {
    //second "where" on commitdate < receiptdate
    if ( this.l_commitdate < this.l_receiptdate ) {
        // emit orderpriority as key, "1" as counter
        emit( this.l_order.o_orderpriority, this.l_order.o_orderkey );
    }
};
reduce1 = function(key, values) {
    return 1;
}
db.runCommand({
    mapReduce: "xx",
    query: query,
    map: map1,
    reduce: reduce1,
    out: 'query004a',
})
map2 = function() {
    //_id is ordepriority
    emit( this._id, 1 );
};
reduce2 = function(key, values) {
    // count entries per orderpriority
    count = 0;
    values.forEach( function(value) { count += value; } );
    return count;
}
db.runCommand({
    mapReduce: "query004a",
    map: map2,
    reduce: reduce2,
    out: 'query004b',
})

现在,可以使用一个更快的聚合命令来实现相同的效果(在 C 中实现,而不是在 JavaScript 中实现):

db.xx.aggregate([
    // first "where", this will use an index, if defined
    { $match: { 
        "l_order.o_orderdate": { $gte: OrderDateMin, $lt: OrderDateMax }
    }},
    // reduce to needed fields, create a field for decision of second "where"
    { $project: { 
        "key": "$l_order.o_orderkey",  
        "pri": "$l_order.o_orderpriority",
        okay: { $cond: [ {$lt: ["l_commitdate", "l_receiptdate"]}, 1, 0 ] }
    }},
    // select second where condition matched
    { $match: { "okay": 1 } },
    // group by priority and key
    { $group: { _id: { "pri": "$pri", "key": "$key" } } },
    // group by priority - count entries
    { $group: { _id: "$_id.pri", "count": { $sum: 1 } } },
])

这将返回类似:

{ "result" : [ { "_id" : "5-LOW", "count" : 1 } ], "ok" : 1 }

最后但并非最不重要的一点:关于设计的建议:

如果您的结构反过来会更简单:一个“订单”集合,其中订单项目嵌入为项目数组。这将避免在整个集合中出现重复的订单数据。

更多信息:

http://docs.mongodb.org/manual/reference/command/mapReduce/#mapReduce

http://docs.mongodb.org/manual/reference/aggregation

这有帮助吗?

干杯

罗纳德

于 2013-03-25T04:50:27.190 回答