python - 检查 for 循环中的字符串是否有多个正则表达式

Question

我正在跟踪某人的日志文件，它们完全是一团糟（没有换行符和分隔符）。所以我做了一些简单的正则表达式来整理日志。日志记录#codes# 现在在一个列表中很好地分开，它们的字符串在一个子字典中附加到它上面。就像这样：

Dict [
    0 : [LOGCODE_53 : 'The string etc etc']
]

因为这很容易，所以我也打算直接向它添加一些日志识别。现在我可以匹配 LOGCODE，但问题是代码不会抱怨任何东西，而且通常不同的 LOGCODE 包含相同的输出字符串。

所以我写了一些 REGEX 匹配来检测日志的内容。我现在的问题是；检测大量字符串模式的智慧是什么？可能有大约 110 种不同类型的字符串，它们是如此不同，以至于不可能“超级匹配”它们。如何在字符串上运行 ~110 REGEXes 以找出字符串的意图，从而在逻辑寄存器中索引它们。

有点像；“拿这个 $STRING 并测试这个 $LIST 中的所有 $REGEXes，让我知道哪个 $REGEX(es)（索引）与字符串匹配”。

我的代码：

import re

# Open, Read-out and close; Log file
f = open('000000df.log', "rb")
text = f.read()
f.close()

matches = re.findall(r'00([a-zA-Z0-9]{2})::((?:(?!00[a-zA-Z0-9]{2}::).)+)', text)

print 'Matches: ' + str(len(matches))
print '=========================================================================================='

for match in matches:
    submatching = re.findall(r'(.*?)\'s (.*?) connected (.*?) with ZZZ device (.*?)\.', match[1])

    print match[0] + ' >>> ' + match[1]
    print match[0] + ' >>> ' + submatching[0][0] + ', ' + submatching[0][1] + ',',
    print submatching[0][2] + ', ' + submatching[0][3]

score 2 · Accepted Answer

re.match,re.search并在特定的正则表达式不匹配时re.findall返回，因此您可以迭代您可能的正则表达式并测试它们：None

tests = [
    re.compile(r'...'),
    re.compile(r'...'),
    re.compile(r'...'),
    re.compile(r'...')
]

for test in tests:
    matches = test.findall(your_string):

    if matches:
        print test, 'works'

python - 检查 for 循环中的字符串是否有多个正则表达式

1 回答 1

Related

Reference