0

我想获取键的值,例如“物理内存 KBytes 总数:8017608,以及所有其他字典。

对于其他字典,我使用的是 python 代码,例如:

import csv
import json
x = []
    # r"""{"data":"foo \\r\\n bar"}"""
for line in open("forcasting/eventdat_Feb/event_nw_2019-02-01.json", 'r', encoding='utf8'):
    x.append(json.loads(line))
#for line in open("forcasting/eventdat_Feb/event_nw_2019-02-01.json", 'r', encoding='utf8',errors='ignore'):

#print(x[0]['_source']['text1']['log'])
f = csv.writer(open("forcasting/eventdat_Feb/Dart95/1st_feb.csv", "w"))
f.writerow(["timestamp","machine","id","customer","type","entered","enteredDate","servertime","username","host","text1_log","text2_log","string1_log"])

    for key in x:
            if key["_source"].get("scrip")=="31":
                    f.writerow([
                            key["_source"].get("@timestamp"),
                            key["_source"].get("machine"),
                            key["_source"].get("id"),
                            key["_source"].get("customer"),
                            key["_source"].get("type"),
                            key["_source"].get("entered"),
                            key["_source"].get("enteredDate"),
                            key["_source"].get("servertime"),
                            key["_source"].get("username"),
                            key["_source"].get("host"),
                            key["_source"].get("text1").get("log"),
                            key["_source"].get("text2").get("log"),
                            key["_source"].get("string1").get("log")
                    ])

但在这个——key["_source"].get("text1").get("log")我正在尝试

key["_source"].get("text1").get("log").get("Physical memory KBytes total") 

但它不工作。

谢谢你

提取此图像突出显示部分的数据时出现问题

这是突出显示的部分:

"text1":{"log":"物理内存:\r 物理内存千字节总数: 8017608\r 物理内存使用中千字节: 5457192\r 物理内存使用百分比: 68\r 物理内存可用千字节: 2560416\r 物理memory 可用百分比:32\r 虚拟内存:\r 虚拟内存 KBytes 总数:137438953344\r Virtual memory KBytes in use:258064\r Virtual memory 正在使用百分比:0\r Virtual memory KBytes free:137438695280\r Virtual memory 可用百分比: 100\r 交换空间:\r 交换空间千字节总数: 12474056\r 交换空间使用中千字节: 10285812\r 交换空间使用百分比: 82\r 交换空间可用千字节: 2188244\r 交换空间可用百分比: 18\ r mSec 采样周期:30000\r 每秒页面读取次数:2\r 正在运行的进程数:208"}

我无法共享所有太大的 json 文件,但我附上了一个示例文件,请检查,它是我们的 json 格式的系统数据(elasticsearch 数据),我需要提取这些值(text1 中的值)来执行一些机器学习的东西。

{"_index":"event_nw_2019-02-01","_type":"events","_id":"uB-xp2gB5-JFORtVXbZW","_score":1,"_source":{"username":"ka100982","text4":{"log":"Process Image Name: Memory Compression\r Process PID: 2628\r Process CPU: 0\r Process Elapsed: 5:22:43\r Process Mem Usage: 955508K\r  \r Process Image Name: chrome#8\r Process PID: 10312\r Process CPU: 0\r Process Elapsed: 5:21:46\r Process Mem Usage: 287852K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: chrome#3\r Process PID: 5556\r Process CPU: 0\r Process Elapsed: 5:21:53\r Process Mem Usage: 210620K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: chrome#15\r Process PID: 4516\r Process CPU: 0\r Process Elapsed: 5:20:41\r Process Mem Usage: 202464K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: chrome#12\r Process PID: 3428\r Process CPU: 0\r Process Elapsed: 5:21:00\r Process Mem Usage: 195764K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: chrome#19\r Process PID: 9628\r Process CPU: 0\r Process Elapsed: 4:25:37\r Process Mem Usage: 191124K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: iexplore#2\r Process PID: 9296\r Process CPU: 2\r Process Elapsed: 5:18:38\r Process Mem Usage: 173444K\r Process: C:\\Program Files (x86)\\Internet Explorer\\IEXPLORE.EXE\r Process Version: 11.00.16299.15 (WinBuild.160101.0800)\r Process Size: 822544\r Process Creation Date: Thursday, August 23, 2018 07:50:50\r Process Last Modified Date: Thursday, March 29, 2018 23:07:49\r  \r Process Image Name: chrome\r Process PID: 10152\r Process CPU: 29\r Process Elapsed: 5:21:54\r Process Mem Usage: 170452K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: chrome#9\r Process PID: 10228\r Process CPU: 0\r Process Elapsed: 5:21:24\r Process Mem Usage: 169132K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: dcuapp\r Process PID: 9864\r Process CPU: 16\r Process Elapsed: 5:21:58\r Process Mem Usage: 157184K\r Process: C:\\Program Files\\Verint\\DPA\\Client\\DCUApp.exe\r Process Version: 11,1,1,19229\r Process Size: 694272\r Process Creation Date: Thursday, July 6, 2017 14:08:28\r Process Last Modified Date: Thursday, July 6, 2017 14:08:28\r  "},"idx":12483141,"version":"","string1":{"log":"27"},"uuid":"67cf6aa9-63f8-48a5-888d-127995fc09e1","id":"0","serverDate":"2019-02-01T06:14:05Z","Tags":["AllMemoryUtilizationEvents","MemUtilizationPhysicalMemoryLessThan8GB"],"entered":"1549001637","scrip":"6","windowtitle":"","text2":{"log":"Type of run: RealTime Monitoring"},"customer":"CompuCom_Selfheal__201800016","string2":{"log":"41444"},"priority":"5","description":"Memory Statistics","enteredDate":"2019-02-01T06:13:57Z","machine":"MH-NW0-198592","text1":{"log":"Physical memory:\r Physical memory KBytes total: 8017608\r Physical memory KBytes in use: 5457192\r Physical memory Percentage in use: 68\r Physical memory KBytes free: 2560416\r Physical memory Percentage free: 32\r Virtual memory:\r Virtual memory KBytes total: 137438953344\r Virtual memory KBytes in use: 258064\r Virtual memory Percentage in use: 0\r Virtual memory KBytes free: 137438695280\r Virtual memory Percentage free: 100\r Swap space:\r Swap space KBytes total: 12474056\r Swap space KBytes in use: 10285812\r Swap space Percentage in use: 82\r Swap space KBytes free: 2188244\r Swap space Percentage free: 18\r mSec Sampling period: 30000\r Page reads per second: 2\r Number of processes running: 208"},"@timestamp":"2019-02-01T06:14:05.294Z","type":"","clientsize":"9030168","size":"0","text3":{"log":""},"path":"","executable":"","servertime":1549001645,"clientversion":"3.002.036.3038.24","host":"35.225.19.235"}}
{"_index":"event_nw_2019-02-01","_type":"events","_id":"uR-xp2gB5-JFORtVXrYC","_score":1,"_source":{"username":"gh102434","text4":{"log":""},"idx":12483142,"version":"","string1":{"log":""},"uuid":"67f31b98-21af-49a6-a6b3-0a48406329cf","id":"0","serverDate":"2019-02-01T06:14:05Z","Tags":["Clientheartbeatevent"],"entered":"1549001644","scrip":"231","windowtitle":"","text2":{"log":"Type of run: Scheduled"},"customer":"CompuCom_Selfheal__201800016","string2":{"log":""},"priority":"5","description":"Client heartbeat","enteredDate":"2019-02-01T06:14:04Z","machine":"MX-D-CIT00100","text1":{"log":"SelfHeal Client is running and responding"},"@timestamp":"2019-02-01T06:14:05.464Z","type":"","clientsize":"9030168","size":"0","text3":{"log":""},"path":"","executable":"","servertime":1549001645,"clientversion":"3.002.036.3038.24","host":"35.225.19.235"}}
4

1 回答 1

0

您在“log”键下拥有的是纯文本,而不是 json 对象,因此在反序列化之后,您得到的是一个字符串,而不是一个字典。您必须自己解析此字符串才能检索数据。

好消息是解析并不太复杂:

def parsedata(logtext):
   # 'logtext' is the whole string value for the 'log' key
   return dict(
      s.strip().split(":") 
      for s in logtext.splitlines() 
      if ":" in s and not s.endswith(":")
      )

logtext = "Physical memory:\r Physical memory KBytes total: 8017608\r Physical memory KBytes in use: 5457192\r Physical memory Percentage in use: 68\r Physical memory KBytes free: 2560416\r Physical memory Percentage free: 32\r Virtual memory:\r Virtual memory KBytes total: 137438953344\r Virtual memory KBytes in use: 258064\r Virtual memory Percentage in use: 0\r Virtual memory KBytes free: 137438695280\r Virtual memory Percentage free: 100\r Swap space:\r Swap space KBytes total: 12474056\r Swap space KBytes in use: 10285812\r Swap space Percentage in use: 82\r Swap space KBytes free: 2188244\r Swap space Percentage free: 18\r mSec Sampling period: 30000\r Page reads per second: 2\r Number of processes running: 208"

print(parsedata[logtext])

=>

{'Number of processes running': ' 208', 'Physical memory KBytes total': ' 8017608', 'Swap space KBytes in use': ' 10285812', 'Swap space Percentage free': ' 18', 'Page reads per second': ' 2', 'Physical memory Percentage free': ' 32', 'Virtual memory KBytes free': ' 137438695280', 'Physical memory Percentage in use': ' 68', 'Physical memory KBytes free': ' 2560416', 'Virtual memory Percentage in use': ' 0', 'Swap space KBytes free': ' 2188244', 'mSec Sampling period': ' 30000', 'Physical memory KBytes in use': ' 5457192', 'Virtual memory KBytes in use': ' 258064', 'Virtual memory KBytes total': ' 137438953344', 'Swap space KBytes total': ' 12474056', 'Virtual memory Percentage free': ' 100', 'Swap space Percentage in use': ' 82'}

编辑:

当我将它与我的代码一起使用来更改嵌套字典时,它给了我这个错误: Traceback(最近一次调用最后):文件“forcasting\feb_data_extract.py”,第 17 行,在 a = parsedata(x[i][ "_source"].get("text1").get("log")) 文件“forcasting\feb_data_extract.py”,第 11 行,在 logtext.splitlines() 中 s 的 parsedata 中 ValueError:字典更新序列元素 #0 有长度 3; 2 是必需的

这意味着日志文本中的一行有不止一个":"分隔符(在这种情况下是两个,因为它产生一个三元组而不是一对)。

您可以更改parsedata实现以获得更准确的报告并最终采取适当的措施(根据线路中的内容以及您希望从中获得的内容,采取适当的措施):

# caveat: untested code
def parsedata(logtext):
   # 'logtext' is the whole string value for the 'log' key
   parsed = {}
   for line in logtext.splitlines:
       line = line.strip().split()
       if not line:
           # empty line
           continue 
       if ":" not in line or line.endswith(":"):
          # we ignored those lines given your initial specs
          # but you may actually want to do something with...
          # let's at least print it for inspection
          print("line is not a key:value pair: '{}' -  ignoring".format(line))
          continue
       try:
           k, v = line.split(":")
       except ValueError:
          print("line has more than one separator: '{}' -  ignoring".format(line))
          # what to do here depends on what the line looks like
          # and what you want to do with it. 
          continue
      parsed[k] = v

  return parsed  

如果碰巧额外的':'分隔符实际上应该是有效值的一部分,您可以从三元组重建值(或任何元组的大小):

splitted = s.split(":") 
# some eventual tests here if needed
k, v = splitted[0], ":".join(*splitted[1:])

或者只是使用maxsplit参数

k, v = s.split(":", 1) 

再次,“正确”操作取决于实际数据和上下文,因此只有您知道应该如何处理。

请注意,所有这些都是非常基本的文本解析/错误处理内容,您应该真正学会自己编写和调试(简单的文本解析实际上是应用程序编程中非常常见的任务)。

于 2019-03-15T11:49:57.947 回答