python - 将字符转换为 16 位 unicode 编码

Question

我有一个 UTF-8 字符，我想将它转换成 16 位的 unicode 编码。怎么做？

可以通过读取写入它的文件并使用 repr() 来获得字符的 Unicode，例如：

import codecs
f = codecs.open("a.txt",mode='rb',encoding='utf-8')
r = f.readlines()
for i in r:
    print i,repr(i)

输出：

پٹ u'\ufeff\u067e\u0679'

现在我怎样才能得到 16 位的 unicode 编码u'\ufeff\u067e\u0679'呢？

score 3 · Accepted Answer

为了获得 unicode代码点，只需调用ord：

import io
f = io.open("a.txt", mode='r', encoding='utf-8')
for line in f:
    print (line, repr(line), ' '.join(str(ord(c)) for c in line),
                  ' '.join('{0:b}'.format(ord(c)) for c in line))

没有一种 unicode编码。如果您正在寻找代码点的 UTF-16 表示形式（长度可能超过 16 位），只需调用

u'\ufeff\u067e\u0679'.encode('utf-16')

score 0 · Accepted Answer

>>> a=u'\ufeff\u067e\u0679'
>>> a
u'\ufeff\u067e\u0679'
>>> a.encode("utf-16")
'\xff\xfe\xff\xfe~\x06y\x06'

最后一行是你想要的字符串。

score 0 · Accepted Answer

因此，如果您的字符串位于s：

s_enc = s.encode("utf-16")
hex_string = "".join([format(i, "X").rjust(2,"0") for i in s_enc])
bin_string = "".join([format(i, "b").rjust(8,"0") for i in s_enc])

我想这就是你所追求的？（在 py3k 中测试，但我认为应该在 2 中工作）。

编辑：需要对 Python 2x 稍作修改：

s_enc = s.encode("utf-16")
hex_string = "".join([format(ord(i), "X").rjust(2,"0") for i in s_enc])
bin_string = "".join([format(ord(i), "b").rjust(8,"0") for i in s_enc])

但是，无论哪种方式，关键是首先调用 encode() 以将其转换为您选择的编码（从您的问题中不清楚，但在两行之间阅读的是 UTF-16）

python - 将字符转换为 16 位 unicode 编码

3 回答 3

Related

Reference