Python 2“窄”构建使用 UTF-16 存储 Unicode 字符串(所谓的泄漏抽象,因此代码点 >U+FFFF 是两个 UTF 代理项。要检索代码点,您必须同时获取前导和尾随代理:
Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:25:58) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = u'Python is fun \U0001f44d'
>>> s[-1] # Just the trailing surrogate
u'\udc4d'
>>> s[-2:] # leading and trailing
u'\U0001f44d'
切换到问题已解决的 Python 3.3+,并且未公开 Unicode 字符串中 Unicode 代码点的存储详细信息:
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = u'Python is fun \U0001f44d'
>>> s[-1] # code points are stored in Unicode strings.
'\U0001f44d'