python - 匹配python中先前定义的组

Question

这个问题是关于匹配python中先前定义的组......但它并不那么简单。

这是我要匹配的文本：

Figure 1: Converting degraded weaponry to research materials.

Converting degraded weaponry to research
materials.

这是我的正则表达式：

(Figure )(\d)(\d)?(: )(?P<description>.+)(\n\n)(?P=description)

现在，我目前的问题是正则表达式无法匹配文本，因为在第三行的“research”之后出现了换行符。我希望 python 在将前一个组与我的字符串匹配时忽略换行符。

score 0 · Accepted Answer

似乎有两种通用方法：要么规范化文本（如 jhermann 建议的那样），要么有一个针对每个可能的匹配运行的函数/代码片段，并进行比单个正则表达式更复杂的比较。

规范化：

text = re.sub(r"\n\n", somespecialsequence, text);
text = re.sun(r"\s*\n", " ", text);
text = re.sub(r"\s+", " ", text);
text = re.sub(somespecialsequence, "\n\n", text);

现在，这应该可以按预期工作：(Figure )(\d)(\d)?(: )(?P<description>.+)(\n\n)(?P=description)

或者，使用代码片段：

matches = re.finditer(r"(Figure )(\d+)(: )(.+)(\n\n)(.+)(?=Figure )", text, flags=re.S)
for m in matches:
    text1 = m.group(4)
    text2 = m.group(6)
    text1 = re.sub("\W+", " ", text1)
    text2 = re.sub("\W+", " ", text2)
    if (text1 == text2):
        // this is a match

python - 匹配python中先前定义的组

1 回答 1

Related

Reference