看来我使用了错误的功能。有.fromstring
- 没有错误信息
xml_ = load() # here comes the unicode string with Cyrillic letters
print xml_ # prints everything fine
print type(xml_) # 'lxml.etree._ElementUnicodeResult' = unicode
xml = xml_.decode('utf-8') # here is an error
doc = lxml.etree.parse(xml) # if I do not decode it - the same error appears here
File "testLog.py", line 48, in <module>
xml = xml_.decode('utf-8')
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 89-96: ordinal not in range(128)
如果
xml = xml_.encode('utf-8')
doc = lxml.etree.parse(xml) # here's an error
或者
xml = xml_
然后
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 89: ordinal not in range(128)
如果我理解正确:我必须将非 ascii 字符串解码为内部表示,然后使用此表示并在发送到输出之前将其编码回来?看来我正是这样做的。
'Accept-Charset': 'utf-8'
由于标题,输入数据必须在 unt-8 中。