我编写了这段代码来通过 python 处理阿拉伯语
import codecs
file = codecs.open("C:\Python27\CCA_raw_utf8.txt","r","utf-8")
text= file.read()
####################################
print "\n "," --------------------------------------------"
text=text[1:]
words=text.split()
for w in words:
if w == unicode ("الشيخ","utf-8"):
print w
但它不起作用,并且会引发错误:
if w == unicode ("الشيخ","utf-8"):
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc7 in position 0: invalid continuation byte "
为什么我的程序会给出这个结果,我们该如何解决?