python - 搜索和替换 HTML 文本，而不是标签

Question

可能重复：
如何在保留 html 标签/结构的同时查找/替换 html 中的文本

我想通过 HTML 文本进行搜索和替换。我不想摆弄标签或其属性，只是 HTML 文本。我应该如何在 Python 中做到这一点？

score 2 · Accepted Answer

import lxml.etree as et
html=\
"""
<!DOCTYPE html>
<html>
  <head>
    <title>Hello HTML</title>
  </head>
  <body>
    <p>Hello 1</p>
    <p>Hello 2</p>
    <p>Hello 3</p>
    <p>Hello 4</p>
  </body>
</html>
"""
doc = et.fromstring(html)
for i in doc.xpath('.//p[contains(.,"Hello") and not(contains(.,"4"))]'):
    i.text='replaced'
print et.tostring(doc,pretty_print=True)

出去：

<html>
  <head>
    <title>Hello HTML</title>
  </head>
  <body>
    <p>replaced</p>
    <p>replaced</p>
    <p>replaced</p>
    <p>Hello 4</p>
  </body>
</html>

score 0 · Accepted Answer

您可以尝试使用Re模块。或者只是使用替换功能。

但是如果您需要对多个关键字进行替换，则搜索和替换的处理效率非常低。您最好通过beautifulSoup或lxml解析结构，获取对象，并对对象进行一些操作。

python - 搜索和替换 HTML 文本，而不是标签

2 回答 2

Related

Reference