python - fonttools 读取 cmap 数据

Question

背景：

使用 fonttools 我想将像“ل”（U+0644）这样的字符更改为它的初始形式“ﻟ”（U+FEDF）。我可以通过 4 个步骤完成此操作：

使用 fonttools，将字体数据保存为 xml，然后通过它解析

font = TTFont(fontPath) font.saveXML("tempfont.xml")
在 cmap 表中找到与 U+0644 关联的名称（假设名称为“isolam”）
在 GSUB 表中找到“init”表并找到具有“isolam”的“in”属性的条目，然后读取它的“out”属性（假设它是“initlam”）
最后在 cmap 表中搜索名称“initlam”并获取代码点

这个过程非常慢，我认为这是因为 xml 文件是硬写的，然后从那里读取，而且还有很多迭代 xml 文件。

问题：

我现在尝试直接使用 TTFont 对象，而不是保存 xml 文件。但我从 cmap 读取代码点时遇到问题。

font = TTFont(fontPath)
cmap = font['cmap'].tables

# there are 3 cmap tables for different platform in the font i am using, but
# for now i'm using cmap[2] which has platformId = 3 and is for windows.
print(cmap[2].data)

但结果似乎是胡言乱语。它很长，所以我只展示一些：

b'\x00`\x00@\x00\x05\x00\x00!\x00+\x00/\x009\x00:\x00>\x00[\x00]\x00{\x00}\x00\xab\x00\xbb \

现在我希望它返回一个字典，其中代码点作为键，名称作为值，或者可能是一个元组列表。

那么如何以可理解的格式访问 cmap 数据呢？

或者我怎样才能得到字形的名称，给定相关的代码点，反之亦然？

score 0 · Accepted Answer

要将实际字符映射到cmap 表中的名称，您可以执行以下操作：

font = TTFont(fontPath)
ch_to_name = {} # key will be the codepoint in hex, value will be name

cmap = font["cmap"]
for ch, name in cmap.getBestCmap().items():
    ch_to_name["{:04X}".format(ch)] = name

python - fonttools 读取 cmap 数据

1 回答 1

Related

Reference