encoding - python 2.x中商标符号的长度

Question

为什么是

>>> len('™')
>>> 3

在 python 2.x 中？

如何快速修复它以将其视为一个字符（如 Python 3.x？）

score 6 · Accepted Answer

您的终端编码设置为 UTF8。您正在计算编码字符中的字节数：

>>> '™'
'\xe2\x84\xa2'
>>> len('™')
3

使用 unicode 来计算字符而不是字节：

>>> u'™'
u'\u2122'
>>> len(u'™')
1

或从终端编码解码：

>>> import sys
>>> '™'.decode(sys.stdin.encoding)
u'\u2122'

在 Python 3 中，字符串是unicode值，而 Python 2str类型重命名为byte（您的输入与 Python 3 中的输入基本相同b'™'）。

你可能想阅读 Python 和 Unicode：

1 回答 1