0

Suppose if I had a string with some unicode characters inside it, and we needed to do operations on it, what would be the best way to do so?

s = u"blah ascii_word etc شاهد word1 word 2" # Delimited by spaces

words = s.split(u' ')

>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in 
    position 91: ordinal not in range(128)

Any clues?

Also, If I wanted to write this code into a text file and read it back later, what would be the procedure?

4

1 回答 1

1

当您以您的方式声明变量时,Python 假定它在您的默认系统编码中,您必须在字符串之前添加 u 以使其成为 unicode 并在文件顶部添加编码声明,如果您这样做,您将不会得到任何错误:

# -*- coding: utf-8 -*-
s = u"blah ascii_word etc شاهد word1 word 2"
words = s.split(u' ')
print words
# no error even tough my default system's encoding is ascii

我现在已经检查过了,你甚至不需要 u - 添加编码就足以解决问题。

如果你想在终端中使用 unicode 字符串,你必须检查你的系统编码并在必要时更改它:

>>> import sys
>>> sys.getdefaultencoding()
'ascii' #I have ascii

然后,您可以使用sys.setdefaultencoding()来操作它。但这是一个棘手的问题,取决于您的操作系统。

于 2013-08-04T19:08:34.787 回答