3
4

2 回答 2

3

Since xml.parsers.expat.ParserCreate supports only four encodings I would try them all. Those encodings are: UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII .

You can now run ElementTree.parse with the encoding like:

from xml.etree.ElementTree import ElementTree
from xml.parsers import expat
tree = ElementTree()
root = tree.parse(xml_file, parser=expat.ParserCreate('UTF-8') )
root = tree.parse(xml_file, parser=expat.ParserCreate('UTF-16') )
root = tree.parse(xml_file, parser=expat.ParserCreate('ISO-8859-1') )
root = tree.parse(xml_file, parser=expat.ParserCreate('ASCII') )
于 2012-07-02T07:56:37.533 回答
1

There are two things you need to establish.

(a) is there an XML declaration and what does it say about the encoding?

(b) what are the actual bytes in the file used to represent these characters?

于 2012-07-02T09:06:38.147 回答