ruby - Elasticsearch 过滤最大值文档

Question

我试图从同名记录中获取文档的最大值。例如，我有 3 个用户，其中 2 个同名但关注者数不同，我想根据 follower_count 的最大值从 2 个同名的同名文档中只返回 1 个文档。

{ id: 1, name: "John Greenwood", follower_count: 100 }
{ id: 2, name: "John Greenwood", follower_count: 200 }
{ id: 3, name: "John Underwood", follower_count: 300 }

所以结果是，

{ id: 2, name: "John Greenwood", follower_count: 200 }
{ id: 3, name: "John Underwood", follower_count: 300 }

从2个相同的名字中，拥有最多关注者的人获胜，另外一个人也会来。

我有如下映射，

"users-development" : {
    "mappings" : {
      "user" : {
        "dynamic" : "false",
        "properties" : {
          "follower_count" : {
            "type" : "integer"
          },
          "name" : {
            "type" : "string",
            "fields" : {
              "exact" : {
                "type" : "string",
                "index" : "not_analyzed"
              }
            }
          },
        }
      }
    }

这是我长期被困的地方，

         {
            query: {
              filtered: {
                filter: {
                  bool: {
                    must: [
                      { terms: { "name.exact": [ "John Greenwood", "John Underwood" ] } },
                    ]
                  }
                }
              }
            },

            aggs: {
              max_follower_count: { max: { field: 'follower_count' } }
            },

            size: 1000,
          }

请有任何建议

score 3 · Accepted Answer

您的问题在弹性堆栈中有一个特殊工具，作为头部 kkk 的锤子。是聚合，请参阅示例：首先，在您的情况下，您需要按全名聚合，包括空格，您的名称字段需要像这样not_analyzed

`PUT /index
{
  "mappings": {
    "users" : {
      "properties" : {
        "name" : {
          "type" :    "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}`

现在您的查询将是这样的：

`POST /index/users/_search
{
   "aggs": {
      "users": {
         "terms": {
            "field": "name"
         },
         "aggs": {
            "followers": {
               "max": {
                  "field": "follower_count"
               }
            }
         }
      }
   }
}`

我只是按名称聚合并使用最大指标来获得最大的追随者数量。

响应将是这样的：

`"aggregations": {
      "users": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "John Greenwood",
               "doc_count": 2,
               "followers": {
                  "value": 200
               }
            },
            {
               "key": "John Underwood",
               "doc_count": 1,
               "followers": {
                  "value": 300
               }
            }
         ]
      }
   }`

希望这对你有好处。在所有需要聚合数据并获取值的总和的情况下使用聚合。

score 0 · Accepted Answer

好的，我认为您正在使用术语聚合来寻找这些方面的东西

{
   "query": {
      "terms": { "name.exact": [ "John Greenwood", "John Underwood" ] }
   },
   "aggs": {
      "max_follower_count": {
         "terms": {
            "field":"name.exact"
         },
         "aggs":{
             "max_follow" : { "max" : { "field" : "follower_count" } }
         }
      }
   },
   "size": 1000
}

术语聚合将为每个唯一值创建一个存储桶 from names.exact，它只会是您的术语查询中指定的那些。所以我们现在有两个 John 的桶，现在我们可以使用max聚合来计算谁拥有最多的追随者。max聚合将对其父聚合中的每个存储桶进行操作。

然后，这些唯一术语中的每一个都将具有计算的最大值follower_count，并显示在存储桶中。结果如下所示：

... //query results of just the terms query up here
"aggregations": {
  "max_follower_count": {
     "doc_count_error_upper_bound": 0,
     "sum_other_doc_count": 0,
     "buckets": [
        {
           "key": "John Greenwood",
           "doc_count": 2,
           "max_follow": {
              "value": 200
           }
        },
        {
           "key": "John Underwood",
           "doc_count": 1,
           "max_follow": {
              "value": 300
           }
        }
     ]
  }
}

术语聚合带有一些关于它如何进行计数的警告，并且链接的文档应该非常清楚。

ruby - Elasticsearch 过滤最大值文档

2 回答 2

Related

Reference