我正在尝试写出一些确实有一些特殊字符的 XML。我遇到麻烦的地方是当我遍历标签列表以创建几个称为标签的元素时。
# -*- coding: utf-8 -*-
import xml.etree.ElementTree as xml
reload(sys)
sys.setdefaultencoding('utf-8')
代码片段:
check = (video['tags'].split(', '))
x=len(check)
y=x-1
for i in xrange(0,y):
tagger = xml.SubElement(doc, 'field', name="tag")
s=check[i]
tagger.text = s.encode('utf-8')
问题是当我尝试写:
output = open(file_name,'w+')
tree = xml.ElementTree(add)
tree.write(output)
output.close()
我收到以下错误:
Traceback (most recent call last):
File "xml_breakup3.py", line 108, in <module>
tagger.text = s.encode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x81 in position 0: invalid start byte
当我在没有这个片段的情况下运行我的代码时,它会毫无问题地写入 xml。如果我让 tagger.text = 任何类型的字符串(即'99')它写得很好。如果我让循环从 0 变为 3,它就可以工作。只有当我尝试遍历整个列表时,我才会收到 UnicodeDecode 错误
当我尝试时:
check = (video['tags'].split(', '))
for ta in check:
tagger = xml.SubElement(doc, 'field', name="tag")
tagger.text = ta
我明白了:
Traceback (most recent call last):
File "xml_breakup3.py", line 172, in <module>
tree.write(output)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 821, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 938, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1074, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError:“utf8”编解码器无法解码位置 0 的字节 0xba:无效的起始字节