If text contains non-ascii data then you should provide it as a Unicode string for el.text
.
As @Abbasov Alexander's answer shows you could do it using a Unicode literal u''
. Python hasn't raise an exception so I assume that you've declared a character encoding of your Python source file (e.g., using # coding: utf-8
comment at the top). This encoding defines how Python interprets non-ascii characters in the source, it is unrelated to the encoding you use to save xml to a file.
If the text is already in a variable and you haven't converted it to Unicode yet, you could do it using text.decode(text_encoding)
(text_encoding
may be unrelated to the Python source encoding).
The confusing bit might be that el.text
(as an optimization) returns a bytestring on Python 2 for pure ascii data. It breaks the rule that you should not mix bytes and Unicode strings. Though It should work if sys.getdefaultencoding()
returns an ascii-based encoding as it does in most cases.
To save xml, pass any character encoding you need totostring()
or ElementTree.write()
functions. Again, this encoding is unrelated to others already mentioned encodings.
In general, use Unicode sandwich: decode bytes to Unicode as soon as you receive them, work with Unicode text inside your program, encode to bytes as late as possible when you need to send the text using API that doesn't support Unicode (files, network).