看:
>>> import xml.etree.ElementTree as et
>>> xmlstring = """<?xml version="1.0" encoding="UTF-8"?>
... <dm><?xml version="1.0" encoding="UTF-8"?>
... <string>R\xc3\xa9sum\xc3\xa9</string>
... </dm>
... """
XML 源代码采用 UTF-8 编码 ( \xc3\xa9
=é):
>>> print xmlstring
<?xml version="1.0" encoding="UTF-8"?>
<dm><?xml version="1.0" encoding="UTF-8"?>
<string>Résumé</string>
</dm>
现在,让我们解析一下:
>>> dm = et.fromstring(xmlstring)
>>> dm.text
u'<?xml version="1.0" encoding="UTF-8"?>\n <string>R\xe9sum\xe9</string>\n'
如您所见,\xc3\xa9
(utf-8)字符已转换为\xe9
(iso-8859-1)。