2

我的目标是将字符串转换为字典。这是它的样子:

[exploit] => 1
[hits] => 1
[completed] => 1
[is_malware] => 1
[summary] => 26.0@13965: suspicious.warning: object contains JavaScript
76.0@14467: suspicious.obfuscation using eval
76.0@14467: suspicious.obfuscation using String.fromCharCode

[severity] => 4
[engine] => 60

所以我尝试了几种方法来做到这一点,第一次尝试是splitat \n,但我遇到了 [summary] 的问题,内容被拆分,因此不起作用。然后我的第二次尝试是split=>但是我遇到了一个问题,即一旦我在 => 处拆分它就不会知道它必须\n在下一个键处拆分。本质上它最终应该看起来像这样 {exploit:1, hits:1, completed:1....} 依此类推

任何帮助将不胜感激。

4

2 回答 2

7

您可以使用re.findall来解析文本:

>>> import re
>>> re.findall('\[([^]]+)\] => (.*?)(?=\n\[|$)', s, re.S)
[('exploit', '1'), ('hits', '1'), ('completed', '1'), ('is_malware', '1'), ('summary', '26.0@13965: suspicious.warning: object contains JavaScript\n76.0@14467: suspicious.obfuscation using eval\n76.0@14467: suspicious.obfuscation using String.fromCharCode\n'), ('severity', '4'), ('engine', '60')]

您可以通过调用将这些值放入字典中dict

>>> dict(re.findall('\[([^]]+)\] => (.*?)(?=\n\[|$)', s, re.S))
{'engine': '60', 'hits': '1', 'severity': '4', 'is_malware': '1', 'summary': '26.0@13965: suspicious.warning: object contains JavaScript\n76.0@14467: suspicious.obfuscation using eval\n76.0@14467: suspicious.obfuscation using String.fromCharCode\n', 'exploit': '1', 'completed': '1'}
于 2012-10-04T13:06:25.697 回答
0
total_string = """\
[exploit] => 1
[hits] => 1
[completed] => 1
[is_malware] => 1
[summary] => 26.0@13965: suspicious.warning: object contains JavaScript
76.0@14467: suspicious.obfuscation using eval
76.0@14467: suspicious.obfuscation using String.fromCharCode

[severity] => 4
[engine] => 60
"""

import re

pattern_RE = '\[([^]]+)\] => (.*?)(?=\n\[|$)'
report_dict = dict(re.findall(pattern_RE, total_string, re.S))

for k, v in report_dict.items():
    print('[{}]: {}'.format(k, v))

print(report_dict)

现在您向我们展示的是这个,但可能隐藏了换行符和回车符。对于我们所看到的,正则表达式似乎没问题。

{   'engine': '60', 
    'hits': '1', 
    'severity': '4', 
    'is_malware': '1', 
    'summary': '(all three captured)',
    'exploit': '1', 
    'completed': '1'
}

因此,如果正则表达式没有捕捉到这一点,那么 total_string 的 repr() 必须与您粘贴的内容略有不同(可能是尾随换行符或其他内容)

于 2012-10-05T15:54:01.037 回答