使用带有 lxml (Python) 的 EXSLT 字符串扩展
http://www.exslt.org/str/str.html
str:replace(str:concat(//text()), "\n", " ")
甚至更简单
normalize-space(str:concat(//text()))
在 Python shell 中测试
>>> import lxml.etree
>>> import lxml.html
>>> doc="""<div>Hello</div>
... <div>World</div>
... <div>!!!</div>
... <span>This</span>
... <span>is</span>
... <span>cool</span>"""
>>> root = lxml.etree.fromstring(doc, parser=lxml.html.HTMLParser())
>>> root.xpath('str:replace(str:concat(//text()), "\n", " ")', namespaces={"str": "http://exslt.org/strings"})
'Hello World !!! This is cool'
>>> root.xpath('normalize-space(str:concat(//text()))', namespaces={"str": "http://exslt.org/strings"})
'Hello World !!! This is cool'
>>>