您需要为文件中每一行文本数据的字体描述符找到 CMap。它看起来像:
16 0 obj
<< /Length 433 >>
stream
/CIDInit /ProcSet findresource begin
12 dict begin b
egincmap
/CIDSystemInfo
<< /Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
/CMapName /Adobe−Identity−UCS def
/CMapType 2 def
1 begincodespacerange
<0000> <FFFF>
endcodespacerange
2 beginbfrange
<0000> <005E> <0020>
<005F> <0061> [<00660066> <00660069> <00660066006C>]
endbfrange
1 beginbfchar
<3A51> <D840DC3E>
endbfchar
endcmap CMapName currentdict /CMap defineresource pop end end endstream
endobj
让我们将此示例转换为表格形式:
+-----------+----------+----------+----------------------+--------------+
| write hex | or ascii | or octal | with substitution | and will see |
+-----------+----------+----------+----------------------+--------------+
| <5f> | _ | \137 | U+0066 U+0066 | ff |
| <60> | ` | \140 | U+0066 U+0069 | fi |
| <61> | a | \141 | U+0066 U+0066 U+006c | ffl |
+-----------+----------+----------+----------------------+--------------+
因此,如果您将在当前 CMap 的字体描述符下看到文本:
TD[(\137\140\141)]TJ === fffiffl
此示例 CMap 包含一个替换。对于单个字符:
+-----------+----------+--------------------+-------------+
| write hex | or octal | means in UTF-16BE | and Unicode |
+-----------+----------+--------------------+-------------+
| <3a51> | \35121 | <D840DC3E> | U+2003e |
+-----------+----------+--------------------+-------------+
而这个替换是TD[(\35121)]TJ ===