couchdb - 在 CouchDB 中“合并”视图排序规则到有用的输出中

Question

在 CouchDB 中进行“加入”时，您可以使用视图排序规则将记录分组在一起。例如，有两种文档类型customer和orders。这样您就可以返回customer，然后是该客户的所有订单，然后是下一个客户和订单。

问题是，您如何合并行，以便如果您有 10 个客户和 40 个订单，您的输出仍然是 10 行而不是 50 行。您实际上将更多信息添加到您的客户行中。

我相信使用 a_list或 areduce可以解决这个问题。问题是如何做到这一点？

score 5 · Accepted Answer

我第二次回答 jhs，但我认为他的“选项 2”太危险了。我很难学会。您可以将 reduce 函数用于许多不错的事情，例如获取博客每个用户的最后一篇文章，但您不能将其用于不减少返回数据量的任何事情。

为了用事实来支持它，我制作了这个小脚本来生成 200 个客户，每个客户有 20 个订单。

#!/bin/bash
echo '{"docs":['
for i in $(seq 1 200); do
  id=customer/$i
  echo '{"_id":"'$id'","type":"customer","name":"Customer '$i'"},'
  for o in $(seq 1 20); do
    echo '{"type":"order","_id":"order/'$i'/'$o'", "for":"'$id'", "desc":"Desc '$i$o'"},'
  done
done
echo ']}'

这是一个很可能发生的情况，只需抛出Error: reduce_overflow_error.

恕我直言，您拥有的两个选项是：

方案一：优化列表功能

通过一点点工作，您可以手动构建 JSON 响应，这样您就不需要在数组中累积订单。

我已经编辑了 jhs 的 list 函数以避免使用任何数组，因此您可以拥有任意数量的订单的客户。

function(head, req) {
  start({'headers':{'Content-Type':'application/json'}});

  var first_customer = true
    , first_order = true
    , row
    ;

  send('{"rows":[');

  while(row = getRow()) {
    if(row.key[1] === 2) {
      // Order for customer
      if (first_order) {
        first_order = false;
      } else {
        send(',');
      }
      send(JSON.stringify(row.value));
    }
    else if (row.key[1] === 1) {
      // New customer
      if (first_customer) {
        first_customer = false;
      } else {
        send(']},');
      }
      send('{"customer":');
      send(JSON.stringify(row.key[0]));
      send(',"orders":[');
      first_order = true;
    }
  }
  if (!first_customer)
    send(']}');

  send('\n]}');
}

选项 2：针对您的用例优化文档

如果您确实需要将订单放在同一个文档中，那么请问问自己是否可以这样存储它并避免在查询时进行任何处理。

换句话说：尝试充分利用文档数据库提供的可能性。设计文档以最适合您的用例，并减少使用它们所需的后处理。

score 2 · Accepted Answer

CouchDB 的主要“意见”之一是它只做在分布式、集群环境中也可能发生的事情。在实践中，这意味着一开始会带来一些不便，但随后会在不更改代码的情况下获得巨大的可扩展性。

换句话说，“加入”问题没有完美的答案。但我认为有两个不错的选择。

我正在使用这个数据集：

$ curl localhost:5984/so/_bulk_docs -XPOST -Hcontent-type:application/json -d @-
{"docs":[
{"type":"customer","name":"Jason"},
{"type":"customer","name":"Hunter"},
{"type":"customer","name":"Smith"},
{"type":"order", "for":"Jason", "desc":"Hat"},
{"type":"order", "for":"Jason", "desc":"Shoes"},
{"type":"order", "for":"Smith", "desc":"Pan"}
]}
^D

[{"id":"4cb766ebafda06d8a3a7382f74000b46","rev":"1-8769ac2fffb869e795c347e7b8c653bf"},
{"id":"4cb766ebafda06d8a3a7382f74000b7d","rev":"1-094eff3e3a5967d974fcd7b3cfd7e454"},
{"id":"4cb766ebafda06d8a3a7382f740019cb","rev":"1-5cda0b61da4c045ff503b57f614454d5"},
{"id":"4cb766ebafda06d8a3a7382f7400239d","rev":"1-50642a9809f15283a9d938c8fe28ef27"},
{"id":"4cb766ebafda06d8a3a7382f74002778","rev":"1-d03d883fb14a424e3db022350b38c510"},
{"id":"4cb766ebafda06d8a3a7382f74002c5c","rev":"1-e9612f5d267a8442d3fc2ae09e8c800d"}]

我的地图功能是

function(doc) {
  if(doc.type == 'customer')
    emit([doc.name, 1], "");
  if(doc.type == 'order')
    emit([doc.for, 2], doc.desc);
}

查询全视图显示：

{"total_rows":6,"offset":0,"rows":[
{"id":"4cb766ebafda06d8a3a7382f74000b7d","key":["Hunter",1],"value":""},
{"id":"4cb766ebafda06d8a3a7382f74000b46","key":["Jason",1],"value":""},
{"id":"4cb766ebafda06d8a3a7382f7400239d","key":["Jason",2],"value":"Hat"},
{"id":"4cb766ebafda06d8a3a7382f74002778","key":["Jason",2],"value":"Shoes"},
{"id":"4cb766ebafda06d8a3a7382f740019cb","key":["Smith",1],"value":""},
{"id":"4cb766ebafda06d8a3a7382f74002c5c","key":["Smith",2],"value":"Pan"}
]}

选项 1：收集结果的列表功能

好处是，如果你要求 10 行，你肯定会得到 10 行（当然除非没有足够的数据）。

但代价是您必须为每个查询进行服务器端处理。你想要的数据只是放在磁盘上，准备好流式传输给你，但现在你已经克服了这个瓶颈。

但是，我个人认为，除非您有已证明的性能问题，否则这_list很好。

function(head, req) {
  start({'headers':{'Content-Type':'application/json'}});

  send('{"rows":');

  var customer = null, orders = [], count = 0;

  var prefix = '\n[ ';
  function show_orders() {
    if(customer && orders.length > 0) {
      count += 1;

      send(prefix);
      prefix = '\n, ';

      send(JSON.stringify({'customer':customer, 'orders':orders}));
    }
  }

  function done() {
    send('\n]}');
  }

  var row;
  while(row = getRow()) {
    if(row.key[1] == 2) {
      // Order for customer
      orders.push(row.value);
    }

    if(row.key[1] == 1) {
      // New customer
      show_orders();

      if(req.query.lim && count >= parseInt(req.query.lim)) {
        // Reached the limit
        done();
        return;
      } else {
        // Prepare for this customer.
        customer = row.key[0];
        orders = [];
      }
    }
  }

  // Show the last order set seen and finish.
  show_orders();
  done();
}

此函数只是循环遍历map行，并且仅在收集到所有信息后才输出完整的客户+订单行。显然，您可以更改输出的 JSON 格式。另外，还有一个?lim=X参数，因为使用该参数limit会干扰地图查询。

危险是这个函数在内存中建立了无限的响应。如果客户下了 10,000 个订单怎么办？还是100,000？最终构建orders阵列将失败。这就是 CouchDB 将它们保留在“高”列表中的原因。如果您永远无法为每位客户获得 10,000 个订单，那么这不是问题。

$ curl 'http://localhost:5984/so/_design/ex/_list/ex/so?reduce=false&lim=2'
{"rows":
[ {"customer":"Jason","orders":["Hat","Shoes"]}
, {"customer":"Smith","orders":["Pan"]}
]}

选项 2：偷偷减少

reduce你可以用一个函数做类似的事情。在这里，我会警告您，这在技术上是不可扩展的，因为您会在磁盘上累积响应，但是我个人更喜欢它而不是 _list，因为代码更简单，而且我知道我直接从磁盘读取数据，而无需后处理。

function(keys, vals, re) {
  // If all keys are the same, then these are all
  // orders for the same customer, so accumulate
  // them. Otherwise, return something meaningless.
  var a;

  var first_customer = keys[0][0][0];
  for(a = 0; a < keys.length; a++)
    if(keys[a][0][0] !== first_customer)
      return null;

  var result = [];
  for(a = 0; a < vals.length; a++)
    if(vals[a]) {
      // Accumulate an order.
      result.push(vals[a]);
    }
  return result;
}

始终查询此视图，使用该视图?group_level=1将按客户细分结果（因为客户名称是键中的第一项map）。

这是违法的，因为您不应该在reduce阶段累积数据。这就是他们称之为reduce的原因。

但是，CouchDB 是轻松的，只要您不构建巨大的列表，它应该可以工作，并且更加优雅。

$ curl 'localhost:5984/so/_design/ex/_view/so?group_level=1&limit=3'
{"rows":[
{"key":["Hunter"],"value":[]},
{"key":["Jason"],"value":["Shoes","Hat"]},
{"key":["Smith"],"value":["Pan"]}
]}

祝你好运！

couchdb - 在 CouchDB 中“合并”视图排序规则到有用的输出中

2 回答 2

选项 1：收集结果的列表功能

选项 2：偷偷减少

Related

Reference