elasticsearch - Elastic 中的聚合和过滤器 - 找到最后的点击并在之后过滤它们

Question

我正在尝试使用 Elastic (5.6) 并找到一种方法来检索每个类别的顶级文档。

我有一个包含以下类型文档的索引：

{
      "@timestamp": "2018-03-22T00:31:00.004+01:00",
      "statusInfo": {
        "status": "OFFLINE",
        "timestamp": 1521675034892
      },
      "name": "myServiceName",
      "id": "xxxx",
      "type": "Http",
      "key": "key1",
      "httpStatusCode": 200
    }
  }

我试图用这些做的是检索@timestamp每个name（我的类别）的最后一个文档（ -based），查看它的 statusInfo.status 是否为OFFLINEorUP并将这些结果获取到响应的命中部分，以便我可以将其放入Kibana 计数仪表板或其他地方（我无法控制且无法自行修改的基于 REST 的工具）。基本上，我想知道我有多少服务 ( ) 在上次更新 ( ) 中name处于离线状态 () 以用于监控目的。我被困在“获取我的多少服务”部分。statusInfo.status@timestamp

到目前为止我的查询：

GET actuator/_search
{
  "size": 0,
  "aggs": {
    "name_agg": {
      "terms": {
        "field": "name.raw",
        "size": 1000
      },
      "aggs": {
        "last_document": {
          "top_hits": {
            "_source": ["@timestamp", "name", "statusInfo.status"], 
            "size": 1,
            "sort": [
              {
                "@timestamp": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  },
  "post_filter": {
    "bool": {
      "must_not": {
        "term": {
          "statusInfo.status.raw": "UP"
        }
      }
    }
  }
}

这提供了以下响应：

{
  "all_the_meta":{...},
  "hits": {
    "total": 1234,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "name_agg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "myCategory1",
          "doc_count": 225,
          "last_document": {
            "hits": {
              "total": 225,
              "max_score": null,
              "hits": [
                {
                  "_index": "myIndex",
                  "_type": "Http",
                  "_id": "dummy id",
                  "_score": null,
                  "_source": {
                    "@timestamp": "2018-04-06T00:06:00.005+02:00",
                    "statusInfo": {
                      "status": "UP"
                    },
                    "name": "myCategory1"
                  },
                  "sort": [
                    1522965960005
                  ]
                }
              ]
            }
          }
        },
        {other_buckets...}
      ]
    }
  }
}

删除大小使结果包含所有文档，这不是我需要的，我只需要每个桶内容（每个都包含一个桶）。删除后置过滤器似乎没有太大作用。

我认为这在ORACLE带有PARTITION BY OVER子句后跟条件的 SQL 中是可行的。

有人知道如何实现吗？

score 0 · Accepted Answer

如果我理解正确，您正在寻找每个组（按名称分组）中状态为 OFFLINE 的最新文档？在这种情况下，您可以尝试下面的查询，并且存储桶中的项目数应该为您提供“有多少已关闭”（对于向上，您将更改过滤器中的术语）

注意：这是在最新版本中完成的，因此它使用关键字字段而不是 raw

POST /index/_search
{
    "size": 0,
  "query":{
    "bool":{
        "filter":{
            "term": {"statusInfo.status.keyword": "OFFLINE"}
        }
    }
  },
  "aggs":{
    "services_agg":{
        "terms":{
            "field": "name.keyword"
        },
        "aggs":{
            "latest_doc":{
                "top_hits": {
                    "sort": [
                        {
                            "@timestamp":{
                                "order": "desc"
                            }
                        }
                        ],
                        "size": 1,
                    "_source": ["@timestamp", "name", "statusInfo.status"]
                }
            }
        }
    }
  }
}

elasticsearch - Elastic 中的聚合和过滤器 - 找到最后的点击并在之后过滤它们

1 回答 1

Related

Reference