我有很多 PDF 文件。在其中一些中,我可以轻松地将文本从 PDF 复制/粘贴到任何文本编辑器。在其他情况下,复制/粘贴只会产生垃圾(奇怪的、不可读的字符)。据我所知,这是因为嵌入的字体和/或自定义编码(但也许我错了)。
我选择了 10 个 PDF 并用于pdffonts
提取字体相关信息。可以复制以 c(正确)文本开头的 PDF,不能复制以 w(错误)开头的 PDF。命令的输出pdffonts
如下。
我可以通过存在自定义编码来识别错误的文档,这是真的吗?换句话说,如果有自定义编码,您不能从 PDF 复制/粘贴文本?
./comparison/c1.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
DDDWSC+MyriadPro-Regular CID Type 0C Identity-H yes yes yes 9
DDDWSC+MyriadPro-Bold CID Type 0C Identity-H yes yes yes 18
XPQSAJ+MinionPro-Regular CID Type 0C Identity-H yes yes yes 36
QQNHBI+MyriadPro-Regular CID Type 0C Identity-H yes yes yes 121
MyriadPro-Regular Type 1C (OT) WinAnsi yes no no 82
./comparison/c2.pdf
name type encoding emb sub uni object I
------------------------------------ ----------------- ---------------- --- --- --- ---------
GBITER+MyriadPro-Regular CID Type 0C Identity-H yes yes yes 9
GBITER+MyriadPro-Bold CID Type 0C Identity-H yes yes yes 18
TPIJNO+MinionPro-Regular CID Type 0C Identity-H yes yes yes 36
HCPLUP+MyriadPro-Regular CID Type 0C Identity-H yes yes yes 99
CFAHCZ+MyriadPro-Regular CID Type 0C Identity-H yes yes yes 100
MyriadPro-Regular Type 1C (OT) WinAnsi yes no no 82
./comparison/c3.pdf
name type encoding emb sub uni object
------------------------------------ ----------------- ---------------- --- --- --- --------
FTWOKY+MyriadPro-Regular CID Type 0C Identity-H yes yes yes 8
FTWOKY+MyriadPro-Bold CID Type 0C Identity-H yes yes yes 9
HDAKMN+MinionPro-Regular CID Type 0C Identity-H yes yes yes 34
CYRRXP+MyriadPro-Regular CID Type 0C Identity-H yes yes yes 119
MyriadPro-Regular Type 1C (OT) WinAnsi yes no no 80
./comparison/c4.pdf
name type encoding emb sub uni object
------------------------------------ ----------------- ---------------- --- --- --- --------
TimesNewRoman CID TrueType Identity-H yes no yes 8
TimesNewRoman,Bold CID TrueType Identity-H yes no yes 9
TimesNewRoman,BoldItalic CID TrueType Identity-H yes no yes 30
TimesNewRomanPSMT TrueType WinAnsi no no no 10
TimesNewRomanPS-BoldMT TrueType WinAnsi no no no 31
TimesNewRomanPS-BoldItalicMT TrueType WinAnsi no no no 32
Arial-BoldItalicMT TrueType WinAnsi no no no 33
CPWIYN+MinionPro-Regular CID Type 0C Identity-H yes yes yes 56
PZAZAE+MyriadPro-Regular CID Type 0C Identity-H yes yes yes 120
MyriadPro-Regular Type 1C (OT) WinAnsi yes no no 102
./comparison/c5.pdf
name type encoding emb sub uni object
------------------------------------ ----------------- ---------------- --- --- --- -------
TimesNewRoman CID TrueType Identity-H yes no yes 9
TimesNewRoman,Bold CID TrueType Identity-H yes no yes 10
TimesNewRomanPSMT TrueType WinAnsi no no no 11
TimesNewRomanPS-BoldMT TrueType WinAnsi no no no 12
PKLOUG+MinionPro-Regular CID Type 0C Identity-H yes yes yes 43
ZWNFNP+MyriadPro-Regular CID Type 0C Identity-H yes yes yes 120
MyriadPro-Regular Type 1C (OT) WinAnsi yes no no 89
./comparison/w1.pdf
name type encoding emb sub uni object
------------------------------------ ----------------- ---------------- --- --- --- --------
ECCDLD+TimesNewRomanPSMT Type 1C WinAnsi yes yes no 5
ECCDMD+TimesNewRoman Type 1C Custom yes yes no 6
ECCDNE+TimesNewRomanPS-BoldMT Type 1C WinAnsi yes yes no 7
ECCDNF+TimesNewRoman,Bold Type 1C Custom yes yes no 8
MinionPro-Regular-Identity-H CID Type 0C Identity-H yes no no 24
./comparison/w2.pdf
name type encoding emb sub uni object
------------------------------------ ----------------- ---------------- --- --- --- --------
DIKJDI+TimesNewRoman,Bold Type 1C Custom yes yes no 5
DIKJEJ+TimesNewRomanPS-BoldMT Type 1C WinAnsi yes yes no 6 0
DIKJEK+TimesNewRomanPSMT Type 1C WinAnsi yes yes no 7
DIKJEL+TimesNewRoman Type 1C Custom yes yes no 8
MinionPro-Regular-Identity-H CID Type 0C Identity-H yes no no 22
./comparison/w3.pdf
name type encoding emb sub uni object
------------------------------------ ----------------- ---------------- --- --- --- --------
LLHACL+Calibri Type 1C Custom yes yes yes 5
LLHACM+Calibri-Bold Type 1C Custom yes yes yes 6
LLHBBI+Calibri-Italic Type 1C Custom yes yes yes 20
MinionPro-Regular-Identity-H CID Type 0C Identity-H yes no no 21
./comparison/w4.pdf
name type encoding emb sub uni object
------------------------------------ ----------------- ---------------- --- --- --- --------EPGNDG+TimesNewRoman Type 1C Custom yes yes no 5
EPGNDH+TimesNewRomanPSMT Type 1C WinAnsi yes yes no 6
EPGNDI+TimesNewRomanPS-BoldMT Type 1C WinAnsi yes yes no 7
EPGNGI+TimesNewRoman,Bold Type 1C Custom yes yes no 8
MinionPro-Regular-Identity-H CID Type 0C Identity-H yes no no 19
OXKXLW+MyriadPro-Regular CID Type 0C Identity-H yes yes yes 60
MyriadPro-Regular Type 1C WinAnsi yes no no 52
./comparison/w5.pdf
name type encoding emb sub uni object
------------------------------------ ----------------- ---------------- --- --- --- --------
JPDEFN+TimesNewRoman Type 1C Custom yes yes no 5
JPDEHN+TimesNewRomanPSMT Type 1C WinAnsi yes yes no 6
JPDEIN+TimesNewRomanPS-BoldMT Type 1C WinAnsi yes yes no 7
JPDEJO+TimesNewRoman,Bold Type 1C Custom yes yes no 8 MinionPro-Regular-Identity-H CID Type 0C Identity-H yes no no 25