python - python unicode woes - 将 cp1252 字符串转换为 unicode

Question

我想我只是从根本上对不是 ascii 的字符集感到困惑。

我有一个我在顶部声明为# -*- coding: cp1252 -*-.

例如，在我拥有的文件question = "what is your borther’s name"中。

type(question)

>> 字符串

question

>> '你的兄弟\xe2\x80\x99s 的名字是什么'

而且我此时无法转换为 unicode，大概是因为您无法从 ASCII 转换为 Unicode。

UnicodeDecodeError：“ascii”编解码器无法解码位置 20 中的字节 0xe2：序数不在范围内（128）

如果我声明为 unicode 开头：

question = "what is your borther’s name"

>> u'你的兄弟叫什么名字'

如何找回“你哥哥叫什么名字”？或者只是 python 解释器如何显示 unicode 字符串，当我将它传递给一个支持 unicode 的应用程序（在本例中为 Office）时，它实际上会正确编码？

我需要保留特殊字符，但我仍然需要使用 Levenshtein 库 ( pip install python-Levenshtein) 进行字符串比较。

Levenshtein.ratio 的两个参数都采用 str 或 unicode，但不能混合使用。

score 0 · Accepted Answer

我有一个纯文本文件，我在顶部声明为# -*- coding: cp1252 -*-.

那没有任何作用。

with codecs.open(..., encoding='cp1252') as fp:
   ...

1 回答 1