python - 将内容保持在一种模式到另一种模式之间

Question

我想解析 html 内容并将内容从 A 保留到 B 例如：

some content1...
<!-- begin_here -->
some content2
<!-- end_here -->
some content3

会变成

<!-- begin_here -->
some content2
<!-- end_here -->

现在，我用 sed 来做：

sed '/begin_here/,/end_here/!d' file.html > file2.html

但是，我想使用 python 重写它以实现跨平台目的。我对python中的正则表达式不是很熟悉。能给我一些提示吗？非常感谢：）

score 2 · Accepted Answer

您可以在没有正则表达式的情况下执行此操作，如下所示：

add_next = False # Do not add lines
# Until you encounter the first "start_here", which sets it to True
with open("file1.html", "r") as in_file:
    with open("file2.html", "w") as out_file:
        for line in in_file:
            if "end_here" in line: # or line.startswith("end_here") for example
                add_next = False
            if add_next:
                out_file.write(line)
            if "begin_here" in line:
                add_next = True

score 2 · Accepted Answer

使用多行正则表达式

import re
pat = re.compile('''^<!-- begin_here -->.*?<!-- end_here -->$''', 
                 re.DOTALL + re.MULTILINE)

with open("file.txt") as f:
    print pat.findall(f.read())

python - 将内容保持在一种模式到另一种模式之间

2 回答 2

Related

Reference