3

我可以使用一些帮助来过滤来自 couchdb 视图的不同值。我有一个数据库,用于存储有关计算机信息的日志。定期将计算机的新日志写入数据库。

有点简化我存储这样的条目:

{
   "name": "NAS",
   "os": "Linux",
   "timestamp": "2011-03-03T16:26:39Z",
}
{
   "name": "Server1",
   "os": "Windows",
   "timestamp": "2011-02-03T19:31:31Z",
}
{
   "name": "NAS",
   "os": "Linux",
   "timestamp": "2011-02-03T18:21:29Z",
}

到目前为止,我正在努力按不同的条目过滤此列表。我想收到的是每台设备的最新日志文件。

我有这样的看法:

function(doc) {
    emit([doc.timestamp,doc.name], doc);
}

我用python(couchdbkit)查询这个视图,到目前为止我想出的最佳解决方案如下所示:

def get_latest_logs(cls):
    unique = []
    for log in cls.view("logs/timestamp", descending=True):
        if log.name not in unique_names:
            unique.append(log)
    return unique

好的……这行得通。但我有强烈的感觉,这不是最好的解决方案,因为 python 需要迭代整个日志文件列表(这可能会变得很长)。

我想我需要一个 reduce 函数,但我真的找不到任何可以适应我的问题的示例或解释。

所以,我正在寻找的是一个(纯 couchdb)视图,它只吐出给定设备的最新日志。

4

1 回答 1

6

Here is what I do. This is borderline CouchDB abuse however I have had much success.

Usually, reduce will compute a sum, or a count, or something like that. However, think of reduce as an elimination tournament. Many values go in. Only one comes out. A reduction! Repeat over and over and you have the ultimate winner (a re-reduction). In this case, the log with the latest timestamp is the winner.

Of course, welterweights can't fight heavyweights. There have to be leagues and weight classes. It only makes sense for certain documents to do battle with certain other similar documents. That is exactly what the reduce group parameter will do. It will ensure that only evenly-matched gladiators enter the steel cage in our bloodsport. (Coffee is kicking in.)

First, emit all logs keyed by device. The value emitted is simply a copy of the document.

function(doc) {
    emit(doc.name, doc);
}

Next, write a reduce function to return the latest timestamp of all given values. If you see a fight between two gladiators from different leagues (two logs from different systems), stop the fight! Something went wrong (somebody queried without the correct group value).

function(keys, vals, re) {
    var challenger, winner = null;
    for(var a = 0; a < vals.length; a++) {
        challenger = vals[a];
        if(!winner) {
            // The title is unchallenged. This value is the winner.
            winner = challenger;
        } else {
            // Fight!
            if(winner.name !== challenger.name) {
                // Stop the fight! He's gonna kill him!
                return null; // With a grouping query, this will never happen.
            } else if(winner.timestamp > challenger.timestamp) {
                // The champ wins! (Nothing to do.)
            } else {
                // The challenger wins!
                winner = challenger;
            }
        }
    }

    // Today's champion lives to fight another day.
    return winner;
}

(Note, the timestamp comparison is probably wrong. You will have to convert to a Date probably.)

Now, when you query a view with ?group=true, then CouchDB will only reduce (find the winner between) values with the same key, which is your machine name.

(You can also emit an array as a key, which gives a bit more flexibility. You could emit([doc.name, doc.timestamp], doc) instead. So you can see all logs by system with a query like ?reduce=false&startkey=["NAS", null]&endkey=["NAS", {}] or you could see latest logs by system with ?group_level=1.

Finally, the "stop the fight" stuff is optional. You could simply always return the document with the latest timestamp. However, I prefer to keep it there because in similar situations, I want to see if I am map-reducing incorrectly, and a null reduce output is my big clue.

于 2011-03-06T05:01:46.937 回答