3

有一个xml文件:

<body>
    <entry>
         I go to <hw>to</hw> to school.
    </entry>
</body>

出于某种原因,我在使用 lxml 解析器解析它之前更改<hw>了 to&lt;hw&gt;</hw>to 。&lt;/hw&gt;

<body>
    <entry>
         I go to &lt;hw&gt;to&lt;/hw&gt; to school.
    </entry>
</body>

但是修改解析出来的xml数据后,我想得到一个<hw>元素,而不是&lt;hw&gt;. 我怎样才能做到这一点?

4

2 回答 2

4

首先找到一个unescape函数:

from xml.sax.saxutils import unescape

entry=body[0]

unescape 并用原来的替换它:

body.replace(entry, e.fromstring(unescape(e.tounicode(entry))))
于 2013-02-02T07:26:13.270 回答
1

If you know which element contains wrongly escaped elements:

# parse whole document as usual..
# find the entry element..
# parse the fragment
fragment = lxml.fromstring(entry.text)
# (optionally) add the fragment to the tree
entry.text = None
entry.append(fragment)
于 2013-02-02T07:14:36.973 回答