python - 使用 Python 以正确的顺序查找一组模式的正则表达式出现

Question

我正在为某些模式解析一系列文本文件，因为我想将它们提取到其他文件中。

一种说法是，我想从文件中“删除”除匹配项之外的所有内容。

例如，如果我有 pattern1、pattern2、pattern3 作为匹配模式，我想要以下输入：

bla bla
pattern1
pattern2
bla bla bla
pattern1
pattern3
bla bla bla
pattern1

给出以下输出：

pattern1
pattern2
pattern1
pattern3
pattern1

我可以使用re.findall并成功获取任何模式的匹配列表，但考虑到每个模式的匹配都混合在文件中，我想不出一种保持顺序的方法。

谢谢阅读。

score 5 · Accepted Answer

将它们组合成一个模式。使用您的示例代码，使用以下模式：

^pattern[0-9]+

如果它实际上更复杂，然后尝试

^(aaaaa|bbbbb|ccccc|ddddd)

score 2 · Accepted Answer

这是“复制这个然后走”格式的答案。

import re

#lets you add more whenever you want
list_of_regex = [r"aaaa",r"bbbb",r"cccc"]

#hold the completed pattern
pattern_string = r"^("

#combines the patterns
for item in list_of_regex:
    pattern_string += "|".join(list_of_regex)

pattern_string += r")"

#open the file that you are reading
fr = open(FILE_TO_READ)

#holds the read files strings
search_string = fr.read()

#close the file
fr.close()

#open the file you want to write to
fw = open(FILE_TO_WRITE, 'w')

#write the results of findall into the file (as requested)
fw.writelines(re.findall(pattern_string,search_string))

#close the file
fw.close()

python - 使用 Python 以正确的顺序查找一组模式的正则表达式出现

2 回答 2

Related

Reference