4

我正在使用以下视图函数来迭代数据库中的所有项目(以查找标签),但我认为如果数据集很大,性能会很差。还有其他方法吗?

def by_tag(tag):
return  '''
        function(doc) {
            if (doc.tags.length > 0) {
                for (var tag in doc.tags) {
                    if (doc.tags[tag] == "%s") {
                        emit(doc.published, doc)
                    }
                }
            }
        };
        ''' % tag
4

4 回答 4

7

免责声明:我没有对此进行测试,也不知道它是否可以表现得更好。

创建单个烫发视图:

function(doc) {
  for (var tag in doc.tags) {
    emit([tag, doc.published], doc)
  }
};

并使用 _view/your_view/all?startkey=['your_tag_here']&endkey=['your_tag_here', {}] 查询

生成的 JSON 结构会略有不同,但您仍将获得发布日期排序。

于 2008-10-17T17:48:37.340 回答
3

You can define a single permanent view, as Bahadir suggests. when doing this sort of indexing, though, don't output the doc for each key. Instead, emit([tag, doc.published], null). In current release versions you'd then have to do a separate lookup for each doc, but SVN trunk now has support for specifying "include_docs=True" in the query string and CouchDB will automatically merge the docs into your view for you, without the space overhead.

于 2008-10-19T15:34:37.657 回答
1

您的观点非常正确。一个想法列表:

视图生成是增量的。如果您的读取流量大于写入流量,那么您的视图根本不会引起问题。对此感到担忧的人通常不应该如此。参考框架,如果您在没有更新的情况下将数百条记录转储到视图中,您应该担心。

发出整个文档会减慢速度。您应该只发出使用视图所必需的内容。

不确定 val == "%s" 的性能是什么,但你不应该想太多。如果有标签数组,您应该发出标签。如果您期望一个包含非字符串的标签数组,那么请忽略它。

于 2008-10-17T05:15:36.207 回答
0
# Works on CouchDB 0.8.0
from couchdb import Server # http://code.google.com/p/couchdb-python/

byTag = """
function(doc) {
if (doc.type == 'post' && doc.tags) {
    doc.tags.forEach(function(tag) {
        emit(tag, doc);
    });
}
}
"""

def findPostsByTag(self, tag):
    server = Server("http://localhost:1234")
    db = server['my_table']
    return [row for row in db.query(byTag, key = tag)]

The byTag map function returns the data with each unique tag in the "key", then each post with that tag in value, so when you grab key = "mytag", it will retrieve all posts with the tag "mytag".

I've tested it against about 10 entries and it seems to take about 0.0025 seconds per query, not sure how efficient it is with large data sets..

于 2008-11-23T05:53:49.217 回答