-2

I am getting a value of column from database like below:

`;;][@+©

When I am reading this in my Python code this is giving below error message:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 7: invalid start byte

Then I tried below code but not working:

unicode(' `;;][@+©', 'utf-8')

Now how can I solve this problem?

4

1 回答 1

3

首先,阅读这篇关于 Unicode的文章。您拥有的字符串以某种编码进行编码,但不是UTF8。我们可以判断它不是 UTF8 的原因是第 7 个字节 0xa9 (= 169) 不在 0-127 (ASCII) 范围内,但前面没有前导字节。

所以诀窍是弄清楚它是什么编码。我们有一个提示:编码需要将字节 0xa9 表示为字形 ©。我猜它是Windows-1252Latin-1编码,因为它们非常常见,并且在网格中查找 A9(字符编码与玩战舰基本相同)在两者中都给出了版权标志。

>>> unicode(' `;;][@+©')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 8: ordinal not in range(128)
>>> unicode(' `;;][@+©', 'latin-1')
u' `;;][@+\xc2\xa9'
>>> unicode(' `;;][@+©', 'cp1252')
u' `;;][@+\xc2\xa9'
于 2013-03-20T08:54:14.577 回答