3

我读过记录的预分配可以提高性能,这应该是有益的,尤其是在处理时间序列数据集的许多记录时。

updateRefLog = function(_ref,year,month,day){
    var id = _ref,"|"+year+"|"+month;
    db.collection('ref_history').count({"_id":id},function(err,count){
        // pre-allocate if needed
        if(count < 1){
            db.collection('ref_history').insert({
                "_id":id
                ,"dates":[{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0}]
            });
        }

        // update
        var update={"$inc":inc['dates.'+day+'.count'] = 1;};
        db.collection('ref_history').update({"_id":id},update,{upsert: true},
            function(err, res){
                if(err !== null){
                    //handle error
                }
            }
        );
    });
};

我有点担心必须通过承诺可能会减慢速度,并且可能每次都检查计数会否定预先分配记录的性能优势。

有没有更高效的方法来处理这个问题?

4

1 回答 1

1

The general statement of "pre-allocation" is about the potential cost of an "update" operation that causes the document to "grow". If that results in a document size that is greater than the currently allocated space, then the document would be "moved" to another location on disk to accomodate the new space. This can be costly, and hence the general recommendation to intially write the document befitting to it's eventual "size".

Honestly the best way to handle such an operation would be to do an "upsert" initially with all the array elements allocated, and then only update the requried element in position. This would reduce to "two" potential writes, and you can further reduce to a single "over the wire" operation using Bulk API methods:

var id = _ref,"|"+year+"|"+month;
var bulk = db.collection('ref_history').initializeOrderedBulkOp();

bulk.find({ "_id": id }).upsert().updateOne({
    "$setOnInsert": {
        "dates": Array.apply(null,Array(32)).map(function(el) { return { "count": 0 }})
   }
});

var update={"$inc":inc['dates.'+day+'.count'] = 1;};
bulk.find({ "_id": id }).updateOne(update);

bulk.execute(function(err,results) {
   // results would show what was modified or not
});

Or since newer drivers are favouring consistency with one another, the "Bulk" parts have been relegated to regular arrays of WriteOperations instead:

var update={"$inc":inc['dates.'+day+'.count'] = 1;};

db.collection('ref_history').bulkWrite([
    { "updateOne": {
        "filter": { "_id": id },
        "update": {
            "$setOnInsert": {
                "dates": Array.apply(null,Array(32)).map(function(el) {
                    return { "count": 0 }
                })
            }
        },
        "upsert": true
    }},
    { "updateOne": {
        "filter": { "_id": id },
        "update": update
    }}
],function(err,result) {
    // same thing as above really
});

In either case the $setOnInsert as the sole block will only do anything if an "upsert" actually occurs. The main case being that the only contact with the server will be a single request and response, as opposed to "back and forth" operations waiting on network communication.

This is typically what "Bulk" operations are for. They reduce that network overhead when you might as well send a batch of requests to the server. The result significantly speeds things, and neither operation is really dependant on the other with the exception of the exception of "ordered", which is the default in the latter case, and explicitly set by the legacy .initializeOrderedBulkOp().

Yes there is a "little" overhead in the "upsert", but there is "less" than in testing with .count() and waiting for that result first.


N.B Not sure about the 32 array entries in your listing. You possibly meant 24 but copy/paste got the better of you. At any rate there are better ways to do that than hardcoding, as is demonstrated.

于 2016-03-03T02:31:22.423 回答