elasticsearch - Elasticsearch：随机字段的聚合

Question

现在我有一个像图片一样的文件。本文档的结构是包含许多随机键字段的“内容”字段（请注意，键没有固定的格式。它们可能就像 UUID 一样）。我想用 ES 查询找到“内容”中所有键的 start_time 的最大值。我能为此做些什么？文件：

{"contents": {
    "key1": {
        "start_time": "2020-08-01T00:00:19.500Z",
        "last_event_published_time": "2020-08-01T23:59:03.738Z",
        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
        "size": 1590513,
        "read_offset": 1590513,
        "name": "key1_name"
    },
    "key2": {
        "start_time": "2020-08-01T00:00:19.500Z",
        "last_event_published_time": "2020-08-01T23:59:03.738Z",
        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
        "size": 1590513,
        "read_offset": 1590513,
        "name": "key2_name"
    }
}}

我已经尝试过乔的解决方案并且它有效。但是当我像这样修改文档时：

{
"timestamp": "2020-08-01T23:59:59.359Z",
"type": "beats_stats",
"beats_stats": {
    "metrics": {
        "filebeat": {
            "harvester": {
                "files": {
                    "d47f60db-ac59-4b51-a928-0772a815438a": {
                        "start_time": "2020-08-01T00:00:18.320Z",
                        "last_event_published_time": "2020-08-01T23:59:03.738Z",
                        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
                        "size": 1590513,
                        "read_offset": 1590513,
                        "name": "/data/logs/galogs/ga_log_2020-08-01.log"
                    },
                    "e47f60db-ac59-4b51-a928-0772a815438a": {
                        "start_time": "2020-08-01T00:00:19.500Z",
                        "last_event_published_time": "2020-08-01T23:59:03.738Z",
                        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
                        "size": 1590513,
                        "read_offset": 1590513,
                        "name": "/data/logs/galogs/ga_log_2020-08-01.log"
                    }
                }
            }
        }
    }
}}

它出错了：

"error" : {
"root_cause" : [
  {
    "type" : "script_exception",
    "reason" : "runtime error",
    "script_stack" : [
      "for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            ",
      "                                                                               ^---- HERE"
    ],
    "script" : "\n          for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            state.start_millis_arr.add(\n              Instant.parse(entry.start_time).toEpochMilli()\n            );\n          }\n        ",
    "lang" : "painless"
  }
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
  {
    "shard" : 0,
    "index" : "agg-test-index-1",
    "node" : "B4mXZVgrTe-MsAQKMVhHUQ",
    "reason" : {
      "type" : "script_exception",
      "reason" : "runtime error",
      "script_stack" : [
        "for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            ",
        "                                                                               ^---- HERE"
      ],
      "script" : "\n          for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            state.start_millis_arr.add(\n              Instant.parse(entry.start_time).toEpochMilli()\n            );\n          }\n        ",
      "lang" : "painless",
      "caused_by" : {
        "type" : "null_pointer_exception",
        "reason" : null
      }
    }
  }
]}

score 0 · Accepted Answer

您可以使用 ascripted_metric来计算这些。这是相当繁重的，但肯定是可能的。

模仿您的索引并同步一些文档：

POST myindex/_doc
{"contents":{"randomKey1":{"start_time":"2020-08-06T11:01:00.515Z"}}}

POST myindex/_doc
{"contents":{"35431fsf31_s35dfas":{"start_time":"2021-08-06T11:01:00.515Z"}}}

POST myindex/_doc
{"contents":{"999bc_123":{"start_time":"2019-08-06T11:01:00.515Z"}}}

获取未知随机子对象的最大日期：

GET myindex/_search
{
  "size": 0,
  "aggs": {
    "max_start_date": {
      "scripted_metric": {
        "init_script": "state.start_millis_arr = [];",
        "map_script": """
          for (def entry : params._source['contents'].values()) {
            state.start_millis_arr.add(
              Instant.parse(entry.start_time).toEpochMilli()
            );
          }
        """,
        "combine_script": """
          // sort in-place
          Collections.sort(state.start_millis_arr, Collections.reverseOrder());
          return DateTimeFormatter.ISO_INSTANT.format(
            Instant.ofEpochMilli(
              // first is now the highest
              state.start_millis_arr[0]
            )
          );

        """,
        "reduce_script": "return states"
      }
    }
  }
}

顺便说一句：@Sahil Gupta 的评论是正确的——永远不要使用可以粘贴文本的图像（并且有帮助）。

elasticsearch - Elasticsearch：随机字段的聚合

1 回答 1

Related

Reference