python - 正则表达式解析文本文档

Question

我试图用 !if 和 !endif 解析文本文档。我希望文本没有！if，！endif 和它们之间的文本。

例如：

text
!if
text1
!endif
text2

我想让我的输出 = text+text2+..

我试过这样的 re.findall(r'((^(!if.*!endif))+', text)。但它似乎对我不起作用。

score 4 · Accepted Answer

您的正则表达式将是：

^!if$.*?^!endif$\s+

这说：

^      - Match the beginning of a line (because of the re.M flag)
!if    - Match !
$      - Match the end of a line (because of the re.M flag)
.*?    - Match any number of characters (non-greedy) (includes line breaks, because of the re.S flag)
^      - Match the beginning of a line (because of the re.M flag)
!endif - Match !endif
$      - Match the end of a line (because of the re.M flag)
\s+    - Match one or more whitespace characters

因此，您应该能够像这样使用它，它将上述正则表达式的所有出现替换为空字符串（无）：

import re
s = "text\n!if\ntext1\n!endif\ntext2"
s = re.sub("^!if$.*?^!endif$\s+", "", s, flags=re.S | re.M)
print s

这将输出：

text 
text2

请注意，这明确要求!if并且!endif位于不同的行上。如果这不是必需的，您可以从正则表达式的中间删除$and^锚点。

^!if.*?!endif$\s+

score 0 · Accepted Answer

我可以帮助 sed：

sed '/^if$/,/^endif$/ d'

这是 sed 使用的算法：

设置变量 match=False
阅读下一行
检查该行是否等于'if'。如果是这样，设置变量 match=True
如果 match==True，检查当前行是否=='endif'。如果是这样，设置 match=False 并删除当前行 [并跳转到 0] 。
打印当前行
如果不是 EOF ，则跳转到 1

python - 正则表达式解析文本文档

2 回答 2

Related

Reference