2

我有一个接受和处理任务的服务。任务有一个状态:排队、运行、失败、取消或完成。有时,该服务会使用 json 输出一个日志条目,如下所示:

2021-09-09 00:30:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 0, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }

我想绘制一个饼图,按状态显示任务的细分(json 消息中的“running_tasks”、“queued_tasks”、“finished_tasks”、“failed_tasks”:、“cancelled_tasks”)。到目前为止,我还没有这样做,因为我无法想出如何根据此类消息构建表格。任何线索将不胜感激 - 在此先感谢!

4

3 回答 3

1

尝试这样的事情。基本上,您必须对数据进行反转置。我希望这是有道理的!

...
| parse field=some_log_line "INFO - *" as jsonMessage
| json field=jsonMessage "running_tasks"
| json field=jsonMessage "queued_tasks"
| json field=jsonMessage "finished_tasks"
| "running_tasks,queued_tasks,finished_tasks," as message_keys
| parse regex field=message_keys "(?<message_key>.*?)," multi
| if (message_key="running_tasks", running_tasks, 0) as message_value
| if (message_key="queued_tasks", queued_tasks, message_value) as message_value
| if (message_key="finished_tasks", finished_tasks, message_value) as message_value
| fields message_key, message_value
| max(message_value) by message_key
于 2021-09-13T22:33:36.803 回答
0

首先,Sumo Logic支持将 JSON 解析为 fields。在您的示例中,并非整行都是 JSON,而是“-”之后的部分,因此您可以将其添加到查询中:

...
| parse "INFO - *" as jsonMessage
| json auto

然后,您可以将running_tasks,queued_tasks等用作普通字段,例如

...
| timeslice 1m
| max(running_tasks), max(queued_tasks) by _timeslice

免责声明:我目前受雇于 Sumo Logic。

于 2021-09-13T08:58:09.640 回答
0

下面是一个纯 python 解决方案,可以绘制数据。

输出 ( entries) 是一个字典,其中键是时间戳,值是包含有趣信息的字典。log_lines保存日志消息的集合并用作输入。

import json
import pprint

log_lines = [
    '2021-09-09 00:30:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 2, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }',
    '2021-09-09 00:31:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 5, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }'
]
entries = dict()

for line in log_lines:
    date = line[:line.find('[') - 1]
    data = json.loads(line[line.find('{'):])
    sub_set = {k: data.get(k,0) for k in
               ["running_tasks", "queued_tasks", "finished_tasks", "failed_tasks", "cancelled_tasks"]}
    entries[date] = sub_set
pprint.pprint(entries)

输出

{'2021-09-09 00:30:46,742': {'cancelled_tasks': 3,
                             'failed_tasks': 0,
                             'finished_tasks': 0,
                             'queued_tasks': 0,
                             'running_tasks': 2},
 '2021-09-09 00:31:46,742': {'cancelled_tasks': 3,
                             'failed_tasks': 0,
                             'finished_tasks': 0,
                             'queued_tasks': 0,
                             'running_tasks': 5}}
于 2021-09-13T09:12:07.930 回答