python - 将所有 unicode 表情符号打印到文件中

Question

可以在 Python 中使用模式打印表情符号的十六进制u'\uXXX'代码，例如

>>> print(u'\u231B')
⌛

但是，如果我有一个像的十六进制代码列表，那么231B仅“添加”该字符串将不起作用：

>>> print(u'\u' + ' 231B')
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

chr()也失败了：

>>> chr('231B')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: an integer is required (got type str)

我的问题的第一部分给出了十六进制代码，例如231A我如何获得str表情符号的类型？

我的目标是从https://unicode.org/Public/emoji/13.0/emoji-sequences.txt获取表情符号列表并阅读第一列的十六进制代码。

在某些情况下，它的范围为231A..231B，我的问题的第二部分给出了一个十六进制代码范围，我如何遍历该范围以获取表情符号str，例如2648..2653，可以这样做，range(2648, 2653+1)但如果六进制中有一个字符，例如1F232..1F236，使用range()是不可能的。

感谢@amadan 的解决方案！！

TL;博士

从https://unicode.org/Public/emoji/13.0/emoji-sequences.txt获取表情符号列表到文件中。

import requests
response = requests.get('https://unicode.org/Public/emoji/13.0/emoji-sequences.txt')

with open('emoji.txt', 'w') as fout:
    for line in response.content.decode('utf8').split('\n'):
        if line.strip() and not line.startswith('#'):
            hexa = line.split(';')[0]
            hexa = hexa.split('..')            
            if len(hexa) == 1:
                ch = ''.join([chr(int(h, 16)) for h in hexa[0].strip().split(' ')])
                print(ch, end='\n', file=fout)
            else:
                start, end = hexa
                for ch in range(int(start, 16), int(end, 16)+1):
                    #ch = ''.join([chr(int(h, 16)) for h in ch.split(' ')])
                    print(chr(ch), end='\n', file=fout)

score 3 · Accepted Answer

将十六进制字符串转换为数字，然后使用chr：

chr(int('231B', 16))
# => '⌛'

或直接使用十六进制文字：

chr(0x231B)

同样，要使用范围，您需要一个 int，从字符串转换或使用十六进制文字：

''.join(chr(c) for c in range(0x2648, 0x2654))
# => '♈♉♊♋♌♍♎♏♐♑♒♓'

或者

''.join(chr(c) for c in range(int('2648', 16), int('2654', 16)))

（注意：你会得到一些非常不同的东西range(2648, 2654)！）

python - 将所有 unicode 表情符号打印到文件中

TL;博士

1 回答 1

Related

Reference