python - 使用 xml.etree.ElementTree (python) 解析 XML 时，如何区分常规空格和转义空格 ()

Question

我xml.etree.ElementTree用来解析 XML 文件。如何强制它去除空格文本（只是常规空格，而不是 ）或留下空格并忽略转义（保持原样）？这是我的问题：

xml_text = """
<root>
    <mytag>
        data_with_space&#32;
    </mytag>
</root>"""
root = xml.etree.ElementTree.fromstring(xml_text)
mytag = root.find("mytag")
print "original text: ", repr(mytag.text)
print "stripped text: ", repr(mytag.text.strip())

它打印：

original text:  '\n        data_with_space \n    '
stripped text:  'data_with_space'

我需要的：

'data_with_space '

或（我可以通过其他方式逃脱）：

'data_with_space&#32;'

使用解决方案xml.etree.ElementTree更可取，因为否则我必须重写大量代码

score 1 · Accepted Answer

The standard XML library treats   and ' ' as equal. There's no way to avoid the equalization if you directly apply fromstring(xml_text), and therefore it's impossible to differentiate them then. The only way to stop the escaping is to translate it into something else before apply fromstring(), and translate it back after then.

import xml.etree.ElementTree

stop_escape   = lambda text: text.replace("&#", "|STOP_ESCAPE|")
resume_escape = lambda text: text.replace("|STOP_ESCAPE|", "&#")

xml_text = """
<root>
    <mytag>
        data_with_space&#32;
    </mytag>
</root>"""
root = xml.etree.ElementTree.fromstring(stop_escape(xml_text))
mytag_txt = resume_escape(root.find("mytag").text)
print "original text: ", repr(mytag_txt)
print "stripped text: ", repr(mytag_txt.strip())

You would get:

original text:  '\n        data_with_space&#32;\n    '
stripped text:  'data_with_space&#32;'

python - 使用 xml.etree.ElementTree (python) 解析 XML 时，如何区分常规空格和转义空格 ()

1 回答 1

Related

Reference