couchdb - CouchDB：根据时间戳返回最新类型的文档

Question

我有一个系统接受来自各种独特来源的状态更新，并且每次状态更新都会创建一个具有以下结构的新文档：

{
 "type": "status_update",
 "source_id": "truck1231",
 "timestamp": 13023123123,
 "location": "Boise, ID"
}

数据纯粹是示例，但可以理解。

现在，这些文档每隔一小时左右生成一次。一小时后，我们可能会插入：

{
 "type": "status_update",
 "source_id": "truck1231",
 "timestamp": 13023126723,
 "location": "Madison, WI"
}

我感兴趣的只是查看每个独特来源的最新更新。我目前正在通过以下地图来做到这一点：

function(doc) {
  if (doc.type == "status_update") {
    emit(doc.source_id, doc);
  }
}

并减少：

function(keys, values, rereduce) {
  var winner = values[0];
  var i = values.length;
  while (i--) {
    var val = values[i];
    if (val.timestamp > winner.timestamp) winner = val;
  }
  return winner;
}

并将数据作为 reduce 查询group=true。这可以按预期工作，并仅提供最新更新的键控结果。

问题是它非常慢，需要我reduce_limit=false在 CouchDB 配置中进行。

感觉必须有一种更有效的方法来做到这一点。更新同一个文档不是一种选择——即使在这种情况下我不需要它，历史也很重要。在客户端处理数据也不是一种选择，因为这是一个 CouchApp，并且系统中的文档数量实际上非常大，并且不适合通过网络发送它们。

提前致谢。

score 3 · Accepted Answer

CouchDB map/reduce 是增量的，这基本上意味着结果总是被缓存，因此对同一视图的后续请求（即使使用不同的搜索参数）“免费”（或以对数时间）运行。

但是，对于 reduce 组，这并不完全正确。有时必须在运行中重新减少部分结果。也许这就是你要打的。

取而代之的是，使用数组作为键发出这样的行的地图视图（即没有 reduce 函数）怎么样：

// Row diagram (pseudo-code, just to show the concept).
// Key                    , Value
// [source_id, timestamp] , null // value is not very important in this example
["truck1231", 13023123123], null
["truck1231", 13023126723], null
["truck5555", 13023126123], null
["truck6666", 13023000000], null

注意源的所有时间戳如何“聚集”在一起。（实际上，他们整理。）要查找的最新时间戳"truck1231"，只需请求该“丛”中的最后一行。为此，请使用limit=1参数从末尾开始进行降序查询。要指定“结束”，请使用{}“高键”值作为键中的第二个元素（有关详细信息，请参阅排序规则链接）。

?descending=true&limit=1&startkey=["truck1231",{}]

（实际上，由于您的时间戳是整数，您可以发出它们的否定，例如-13023123123。这将简化您的查询，但我不知道，这对我来说似乎是在玩火。）

为了生成这些类型的行，我们使用这样的 map 函数：

function(doc) {
  // Emit rows sorted first by source id, and second by timestamp
  if (doc.type == "status_update" && doc.timestamp) {
    emit([doc.source_id, doc.timestamp], null) // Using `doc` as the value would be fine too
  }
}

score 3 · Accepted Answer

_stats您可以使用内置的 reduce 函数获取每个源的最新时间戳，然后执行另一个查询以获取文档。以下是观点：

"views": {
  "latest_update": {
    "map": "function(doc) { if (doc.type == 'status_update') emit(doc.source_id, doc.timestamp); }",
    "reduce": "_stats"
  },
  "status_update": {
    "map": "function(doc) { if (doc.type == 'status_update') emit([doc.source_id, doc.timestamp], 1); }"
  }
}

首先使用查询latest_update，group=true然后status_update使用类似（正确的 url 编码）查询：

keys=[["truck123",TS123],["truck234",TS234],...]&include_docs=true

其中 TS123 和 TS234 是的max返回值latest_update。

score 1 · Accepted Answer

我怀疑它之所以慢只是因为您发出了整个文档，这意味着需要存储和移动大量数据来计算您的最终值。尝试发出时间戳：

function(doc) {
  if (doc.type == "status_update") {
    emit(doc.source_id, [doc._id,doc.timestamp]);
  }
}

function(keys, values, rereduce) {
  var winner = values[0];
  var i = values.length;
  while (i--) {
    var val = values[i];
    if (val[1] > winner[1]) winner = val;
  }
  return winner;
}

这应该可以[id,timestamp]为每个键提供一对，而不会太慢或不必在视图中存储太多数据。

在客户端获得标识符列表后，使用批量 GET API 发送第二个请求：

_all_docs?keys=[id1,id2,id3,...,idn]&include_docs=true

这将在一个请求中获取所有文档。

couchdb - CouchDB：根据时间戳返回最新类型的文档

3 回答 3

Related

Reference