elasticsearch - 脚本嵌套字段上的 ElasticSearch 聚合

Question

我的 ElasticSearch 索引中有以下映射（简化为其他字段不相关：

{
  "test": {
    "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "float"
            }
          }
        }
      }
    }
  }
}

数据看起来像这样（再次简化）：

[
  {
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  },
  {
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  },
  {
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }
]

float_property我正在尝试对每个文档的最大值执行存储桶聚合。因此，对于上面的示例，以下将是所需的响应：

...
{
  "buckets": [
    {
      "key": "0.9",
      "doc_count": 2
    },
    {
      "key": "0.6",
      "doc_count": 1
    }
  ]
}

因为 doca的最高嵌套值为float_property0.6，b's 是 0.9，c's 是 0.9。

我试过混合使用nestedand 和aggs，runtime_mappings但我不确定使用这些的顺序，或者这是否可能。

score 0 · Accepted Answer

我使用您的映射创建了索引，将 float_property 类型更改为 double。

PUT testindex
{
 "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "include_in_parent": true, 
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "double"
            }
          }
        }
      }
    }
}

为“嵌套”类型添加了“include_in_parent”：true

索引文件：

PUT testindex/_doc/1
{
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  }
 
PUT testindex/_doc/2
{
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  }
  
PUT testindex/_doc/3 
{
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }

然后嵌套字段上的术语聚合：

POST testindex/_search
{
  "from": 0,
  "size": 30,
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "entities.float_property": {
      "terms": {
        "field": "entities.float_property"
      }
    }
  }
}

它产生如下聚合结果：

"aggregations" : {
    "entities.float_property" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 0.2,
          "doc_count" : 2
        },
        {
          "key" : 0.9,
          "doc_count" : 2
        },
        {
          "key" : 0.4,
          "doc_count" : 1
        },
        {
          "key" : 0.6,
          "doc_count" : 1
        }
      ]
    }
  }

score 0 · Accepted Answer

我最终设法解决了这个问题。

我没有意识到的两件事是：

您可以为存储桶聚合提供 ascript而不是key。field
nested您可以直接使用访问嵌套值，而不是使用查询params._source。

这两件事的结合使我能够编写正确的查询：

{
  "size": 0,
  "aggs": {
    "max.float_property": {
      "terms": {
        "script": "double max = 0; for (item in params._source.entities) { if (item.float_property > max) { max = item.float_property; }} return max;"
      }
    }
  }
}

回复：

{
  ...
  "aggregations": {
    "max.float_property": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "0.9",
          "doc_count": 2
        },
        {
          "key": "0.6",
          "doc_count": 1
        }
      ]
    }
  }
}

不过我很困惑，因为我认为访问nested字段的正确方法是使用nested查询类型。不幸的是，这方面的文档很少，所以我仍然不确定这是否是在脚本嵌套字段上聚合的预期/正确方法。

elasticsearch - 脚本嵌套字段上的 ElasticSearch 聚合

2 回答 2

Related

Reference