python - 在 python 中使用 unicode

Question

score 5 · Accepted Answer

You want to decode (not encode) to get a unicode string from a byte string.

>>> s = '\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'
>>> us = s.decode('utf-8')
>>> print us
марка

Note that you may not be able to print it because it contains characters outside ASCII. But you should be able to see its value in a Unicode-aware debugger. I ran the above in IDLE.

Update

It seems what you actually have is this:

>>> s = u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'

This is trickier because you first have to get those bytes into a bytestring before you call decode. I'm not sure what the "best" way to do that is, but this works:

>>> us = ''.join(chr(ord(c)) for c in s).decode('utf-8')
>>> print us
марка

Note that you should of course be decoding it before you store it in the database as a string.

score 4 · Accepted Answer

Mark is right: you need to decode the string. Byte strings become Unicode strings by decoding them, encoding goes the other way. This and many other details are at Pragmatic Unicode, or, How Do I Stop The Pain?.

python - 在 python 中使用 unicode

2 回答 2

Related

Reference