python - Python print to terminal shell unicode

Question

I am parsing a long string of persian in python, and am opening it like this:

fp = codecs.open(f+i, 'r', encoding='utf-8').readlines()

and using

print(line[1])

but instead of printing out readable Persian, it outputs things like this in the terminal.

Ø§Ø·Ù
     Ø§Ø¹âØ±Ø³Ø§Ù

On the webpage, it outputs it fine.

What is the issue with it? Thank you

score 4 · Accepted Answer

您在这里有一个CP1252 Mojibake。第一个字符是代码点U+0627 ARABIC LETTER ALEF，编码为 UTF-8，但随后解释为CP1252：

>>> print u'\u0627'.encode('utf8').decode('cp1252')
Ø§

您的 SSH shell 在某处配置错误；远程 shell 认为您正在使用 UTF-8，而在本地打印的 UTF-8 字节就像它们是 CP1252 字节一样被打印。

我能破译的是：

该Ù字符是 U+640 到 U+0660 范围内任何内容的 Mojibake 起点；我们在这里看不到两次出现的第二个字节。角色同上â；第二个字节在 CP1252 中不可打印，因此再次丢失。

总的来说，我能恢复的是：

>>> print u'Ø§Ø· - Ø§Ø¹ - Ø±Ø³Ø§'.encode('cp1252').decode('utf8')
اط - اع - رسا

1 回答 1