0

我有一个 ELK 部署来收集日志。现在我需要提取所有包含一个特定字符串的日志。但是我遇到了一个有趣的问题,即我在 Kibana 的开发工具和 elasticsearch python 客户端中得到了不同的输出。

这是 Kibana 中的查询:

GET app_web_log-20180827/_search
{
  "query": {
    "bool": {
      "must": [
        { "match_phrase": { "message":   "Failed to call Billing API Server" }}
      ],
      "filter": [
        { "term":  { "deployment": "app_instance1" }},
        { "term":  { "module": "test_module" }}, 
        { "range": { "@timestamp": { "gte": 1535266800000, "lt": 1535353200000 }}} 
      ]
    }
  },
  "size": 5
}

下面是开发工具的输出:

{
  "took": 556,
  "timed_out": false,
  "_shards": {
    "total": 175,
    "successful": 175,
    "skipped": 165,
    "failed": 0
  },
  "hits": {
    "total": 400,
    "max_score": 34.769733,
    "hits": [
      {
        "_index": "app_web_log-20180827",
        "_type": "doc",
        "_id": "FMkHeWUB_hBu7Tio4Llg",
        "_score": 34.769733,
        "_source": {
          "beat": {
            "version": "6.2.4",
            "name": "app-web001",
            "hostname": "app-web001"
          },
          "offset": 349461,
          "@timestamp": "2018-08-27T01:38:03.049Z",
          "source": "/apphome/app_instance1/logs/test_module.log",
          "message": "2018-08-27 01:37:59,661 [http-bio-8168-exec-8] ERROR [Billing APIClientImpl] Failed to call Billing API Server. Billing API Billing server response error, tranId:c95cede3a011d97fd9f3d661eb961cb8",
          "module": "test_module",
          "@version": "1",
          "deployment": "app_instance1"
        }
      },
....

但是当我查询时使用 elasticsearch python 客户端。它什么也没给我:

from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'esserver', 'port': 9200, 'username': 'appuser', 'password': 'elastic'}])

body = {
  "query": { 
    "bool": { 
      "must": [
        { "match_phrase": { "message":   "Failed to call Billing API Server" }}
      ],
      "filter": [ 
        { "term":  { "deployment": "app_instance1" }},
        { "term":  { "module": "test_module" }}, 
        { "range": { "@timestamp": { "gte": 1535266800000, "lt": 1535353200000 }}} 
      ]
    }
  }
}
print body

page = es.search(index='app_web_log-20180827', doc_type='doc', body=body,
         scroll='2m', size=100)
sid = page['_scroll_id']
scroll_size = page['hits']['total']
while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    for m in page['hits']['hits']:
        msg = m['_source']['message']
        print msg

我什么都没有:

{'query': {'bool': {'filter': [{'term': {'deployment': 'app_instance1'}}, {'term': {'module': 'test_module'}}, {'range': {'@timestamp': {'lt': 1535353200000, 'gte': 1535266800000}}}], 'must': [{'match_phrase': {'message': 'Failed to call Billing API Server'}}]}}}
Scrolling...

我想知道代码中是否有任何问题?请帮助。谢谢

4

1 回答 1

1

我建议您查看scan为您执行逻辑的助手 ([0])。

我假设由于您只是在调用之后scroll而不是之前迭代页面,因此您没有处理searchAPI 调用返回的命中。您也已size设置为,100因此很可能所有命中都在page您忽略的变量的第一个值中。

0 - https://elasticsearch-py.readthedocs.io/en/master/helpers.html#scan

于 2018-08-30T22:44:43.203 回答