python - 以 unicode 字符串为名称的 namedtuple

Question

我无法将 unicode 字符串分配为命名元组的名称。这有效：

a = collections.namedtuple("test", "value")

这不会：

b = collections.namedtuple("βαδιζόντων", "value")

我得到错误

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/lib64/python3.4/collections/__init__.py", line 370, in namedtuple
        result = namespace[typename]
KeyError: 'βαδιζόντων'

为什么会这样？文档说，“Python 3 还支持在标识符中使用 Unicode 字符”，并且密钥是有效的 unicode？

score 6 · Accepted Answer

6

于 2015-05-28T10:38:21.867 回答

score 2 · Accepted Answer

That ó is U+1F79 ɢʀᴇᴇᴋ sᴍᴀʟʟ ʟᴇᴛᴛᴇʀ ᴏᴍɪᴄʀᴏɴ ᴡɪᴛʜ ᴏxɪᴀ. Python identifiers are normalized as NFKC, and U+1F79 in NFKC becomes U+03CC ɢʀᴇᴇᴋ sᴍᴀʟʟ ʟᴇᴛᴛᴇʀ ᴏᴍɪᴄʀᴏɴ ᴡɪᴛʜ ᴛᴏɴᴏs.

Interestingly, if you use the same string with U+1F79 replaced by U+03CC, it works.

>>> b = collections.namedtuple("βαδιζ\u03CCντων", "value")
>>>

The documentation for namedtuple claims that "Any valid Python identifier may be used for a fieldname". Both strings are valid Python identifiers, as can be easily tested in the interpreter.

>>> βαδιζόντων = 0
>>> βαδιζόντων = 0
>>>

This is definitely a bug in the implementation. I traced it to this bit in implementation of namedtuple:

namespace = dict(__name__='namedtuple_%s' % typename)
exec(class_definition, namespace)
result = namespace[typename] # here!

I guess that the typename left in the namespace dictionary by exec'ing the class_definition template, being a Python identifier, will be in NFKC form, and thus no longer match the actual value of the typename variable used to retrieve it. I believe simply pre-normalizing typename should fix this, but I haven't tested it.

score 1 · Accepted Answer

尽管已经有一个公认的答案让我提供一个

问题的修复

# coding: utf-8
import collections
import unicodedata


def namedtuple_(typename, field_names, verbose=False, rename=False):
    ''' just like collections.namedtuple(), but does unicode nomalization
        on names
    '''

    if isinstance(field_names, str):
        field_names = field_names.replace(',', ' ').split()
    field_names = [
        unicodedata.normalize('NFKC', name) for name in field_names]
    typename = unicodedata.normalize('NFKC', typename)

    return collections.namedtuple(
        typename, field_names, verbose=False, rename=False)


βαδιζόντων = namedtuple_('βαδιζόντων', 'value')

a = βαδιζόντων(1)

print(a)
# βαδιζόντων(value=1)
print(a.value == 1)
# True

它有什么作用？

namedtuple_()在将名称交给之前，使用此实现对名称进行规范化collections.namedtuple()，从而可以拥有一致的名称。

这是对@R 的详细说明。Martinho Fernandes 对名称进行预规范化的想法。

python - 以 unicode 字符串为名称的 namedtuple

3 回答 3

Related

Reference