elasticsearch - 使用基数但试图用它找到总长度

Question

我一直在使用基数来查找一些独特的字段，例如作者

    "aggs": {
       "author_count" : {
        "cardinality" : {
            "field" : "author"
        }
       }
   }

这适用并计算其中具有唯一作者的所有作者字段。

现在我想找出这些独特作者的总规模。对于其他查询，我刚刚通过添加

  "aggs":{
    "sum":{
      "field" : "length" }}}

但是当我尝试过这个时，它给了我所有内容的总长度，而不仅仅是针对独特的作者。

因此，例如，如果字段作者仅包含一个“Kim”，则应返回该字段。我希望每个只写过一本书的作者也将他们所有的页长加在一起。

例如

"author" : "kim",
"length": 100

"author" : "lolo",
"length": 100

输出应该是author_count 2和total_length 200。

但对于

"author" : "kim",
"length": 100

"author" : "lolo",
"length": 100

"author" : "lolo",
"length": 100

输出应该是author_count 1和total_length 100。因为金是唯一唯一的作者（只写了一本书的作者）

有任何想法吗？

score 1 · Accepted Answer

理解题后，这可以通过bucket selector aggregation和sum bucket aggregation来实现。author 字段上的第一个 terms 聚合将给出所有唯一作者，然后value count 聚合将给出这些唯一作者所写的书籍。 total_sum总和页面的长度。

现在桶选择器将只保留那些只写过一本书的作者的桶，最后sum_bucket将这些作者的所有长度相加

{
  "size": 0,
  "aggs": {
    "unique_author": {
      "terms": {
        "field": "author",
        "size": 100
      },
      "aggs": {
        "total_book_count": {
          "value_count": {
            "field": "author"
          }
        },
        "total_sum": {
          "sum": {
            "field": "length"
          }
        },
        "only_single_book_author": {
          "bucket_selector": {
            "buckets_path": {
              "total_books": "total_book_count"
            },
            "script": "total_books==1"
          }
        }
      }
    },
    "page_length": {
      "sum_bucket": {
        "buckets_path": "unique_author>total_sum"
      }
    }
  }
}

elasticsearch - 使用基数但试图用它找到总长度

1 回答 1

Related

Reference