python - 从字符串中获取特定编码的字符代码

Question

我正在尝试从 unicode 字符串中获取 shift-jis 字符代码。我在 python 方面并不是那么博学，但这是我迄今为止尝试过的：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from struct import *

data="臍"
udata=data.decode("utf-8")
data=udata.encode("shift-jis").decode("shift-jis")
code=unpack(data, "Q")
print code

但我得到一个UnicodeEncodeError: 'ascii' codec can't encode character u'\u81cd' in position 0: ordinal not in range(128)错误。字符串始终是单个字符。

score 1 · Accepted Answer

该字符在 shift-jis 中表示为两个字节序列 0xE4 和 0x60：

>>> data = u'\u81cd'
>>> data_shift_jis = data.encode('shift-jis')
'\xe4`'
>>> hex(ord('`'))
0x60

所以'\xe4\x60'被u'\u81cd'编码为 shift-jis。

score 0 · Accepted Answer

在 python 2 中，当你创建一个utf-8编码字符串时，你可以保留编码（data =“脐”），或者你可以让 python 在解析程序时将它解码为一个 unicode 字符串（`data = u“脐”）。当您的源文件是 utf-8 编码时，第二个选项是创建字符串的常规方法。

当您尝试转换为 JIS 时，您最终将 JIS 解码回 python unicode 字符串。当您尝试解包时，您要求“Q”（无符号长长），而您真正想要“H”（无符号短）。

以下是获取角色信息的两个示例

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from struct import *

# here we have an "ascii" string that is really utf-8 encoded char
data="臍"
jis_data = data.decode('utf-8').encode("shift-jis")
code = unpack(">H", jis_data)[0]
print repr(data), repr(jis_data), hex(code)[2:]

# here python decodes the utf-8 encoded char for us
data=u"臍"
jis_data = data.encode("shift-jis")
code = unpack(">H", jis_data)[0]
print repr(data), repr(jis_data), hex(code)[2:]

这导致

'\xe8\x87\x8d' '\xe4`' 58464 0xe460
u'\u81cd' '\xe4`' 58464 0xe460

python - 从字符串中获取特定编码的字符代码

2 回答 2

Related

Reference