php - pdfmark：生成的 PDF 书签标题中的某些重音字符未正确显示

Question

我正在向现有 PDF 插入书签，但重音“c”存在一些问题。有例子（例子中使用的字符集是UTF-8）：

$name = "Ruční nářadí";

$name = chr(254).chr(255).iconv('UTF-8', 'UTF-16BE', str_replace(array('(',')','/'),array('\\(','\\)','\\/'),$name));

$fh = fopen('pdfmark.txt', 'w');
fputs($fh, "[/Title ({$name}) /Page 1 /OUT pdfmark\n");
fclose($fh);

$command = "gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf final.pdf pdfmark.txt; mv out.pdf final.pdf";
exec($command);

问题是重音č在最终 PDF 的书签中显示为Ċ（具有不同重音的大写字母）。我尝试了我的语言（捷克语）中使用的其他重音字符，除了这个，一切都很好。

感谢您提供解决此问题的任何线索。

编辑（2013-02-01）：

使用的 GhostScript 版本是 9.06 (2012-08-08)。我正在使用 Adobe Reader 11.0.1 查看生成的 PDF 文件。

我还在考虑它...是否必须以某种方式在 PDF 中指定编码？因为源 PDF 不在我的控制范围内，我对此一无所知。如果是这样，有没有办法使用 GS 或 pdfmark 来做到这一点？我认为如果书签的编码是 Unicode，那真的没关系，但也许我错了。

编辑（2013-02-05）：

GS 的 pdfwrite 或 Acrobat 中似乎存在错误，请参阅GS 的错误跟踪中的更多信息。我会在这里写解决方案信息，待解决后。

score 2 · Accepted Answer

我将从将字符串简化为单个违规字符开始。然后查看 pdfmark.txt 中的字符串，看它是否正确 UTF-16BE 编码。

假设这是正确的，然后尝试从命令行运行 Ghostscript 并查看是否有效。如果没有，您可以打开一个错误报告，您可以在http://bugs.ghostscript.com上执行此操作，如果您这样做，请提供源文件和命令行。

您没有说您使用的是什么版本的 Ghostscript，也没有说您使用什么来查看生成的 PDF 文件。两者都有用......

score 0 · Accepted Answer

以下代码片段说明了您需要做什么。

在 postscript 中，可以使用 \000 表示法访问特殊字符，其中 000 是字符位置。3 位位置是八进制，其中 \350 等于十进制位置 232 和十六进制位置 E8。

您要寻找的角色是 Ccaron 和 ccaron。为了能够访问这些字符，您需要在字体编码表中定义它们。CEEncoding 表是 Adobe 的中欧字符集。Postscript 可能已经在某处定义了 CEEncoding，但这个示例定义了它自己的。与此示例一样，您可以定义任何您喜欢的编码。网络上提供的 postscript 语言参考手册提供了有关可用字符的详细信息。

此示例使用标准 /Helvetica 输出测试 1234，然后基于标准 /Helvetica 定义新字体 /Helvetica-CE，但使用 CEEncoding 编码。(Ru\350ní) show 使用 CEEncoding 定义为 ccaron 的字符 \350。只是为了好玩，我还重新定义了字符\001为Ccaron，\002为欧元符号，\003为商标符号，以说明任何字符都可以定义为任何字符并将其输出为（测试4567\001\ 002\003) 显示。并非所有字体都定义了所有符号。没有符号的字体将替换为空格字符。

就这么简单；）

/Helvetica findfont 46 scalefont setfont
100 75 moveto
(testing 1234) show
/CEEncoding [
/.notdef /Ccaron /Euro /trademark /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /space /exclam /quotedbl
/numbersign /dollar /percent /ampersand /quoteright
/parenleft /parenright /asterisk /plus /comma
/minus /period /slash /zero /one
/two /three /four /five /six
/seven /eight /nine /colon /semicolon
/less /equal /greater /question /at
/A /B /C /D /E
/F /G /H /I /J
/K /L /M /N /O
/P /Q /R /S /T
/U /V /W /X /Y
/Z /bracketleft /backslash /bracketright /asciicircum
/underscore /quoteleft /a /b /c
/d /e /f /g /h
/i /j /k /l /m
/n /o /p /q /r
/s /t /u /v /w
/x /y /z /braceleft /bar
/braceright /tilde /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/Sacute /.notdef /.notdef /Zacute /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /sacute /.notdef /.notdef /zacute
/space /.notdef /breve /Lslash /currency
/Aogonek /.notdef /dieresis /.notdef /Scaron
/Scedilla /Tcaron /Zacute /hyphen /Zcaron
/Zdotaccent /degree /aogonek /ogonek /lslash
/acute /lcaron /.notdef /caron /cedilla
/aogonek /scedilla /tcaron /zacute /hungarumlaut
/zcaron /zdotaccent /Racute /Aacute /Acircumflex
/Abreve /Adieresis /Lacute /Cacute /Ccedilla
/Ccaron /Eacute /Eogonek /Edieresis /Ecaron
/Iacute /Icircumflex /Dcaron /Eth /Nacute
/Ncaron /Oacute /Ocircumflex /Ohungarumlaut /Odieresis
/multiply /Rcaron /Uring /Uacute /Uhungarumlaut
/Udieresis /Yacute /Tcedilla /germandbls /racute
/aacute /acircumflex /abreve /adieresis /lacute
/cacute /ccedilla /ccaron /eacute /eogonek
/edieresis /ecaron /iacute /icircumflex /dcaron
/eth /nacute /ncaron /oacute /ocircumflex
/ohungarumlaut /odieresis /divide /rcaron /uring
/uacute /uhungarumlaut /udieresis /yacute /tcedilla
/dotaccent
] def

/Helvetica findfont
dup length dict begin
{ 1 index /FID ne
{def}
{pop pop}
ifelse
} forall
/Encoding CEEncoding def
currentdict
end
/Helvetica-CE exch definefont pop
/Helvetica-CE findfont 36 scalefont setfont
100 100 moveto
(\310\350) show
100 150 moveto 
(Ru\350ní) show
100 200 moveto
(testing 4567\001\002\003) show
 showpage

score 0 · Accepted Answer

根据错误跟踪帖子，我可以用不同的方式对字符串进行编码（也可以帮助下载更新的9.08 PRERELEASE版本）：

$name = "Ruční nářadí";

$name = 'FEFF'.strtoupper(bin2hex(iconv('UTF-8', 'UCS-2BE', str_replace(array('(',')','/'),array('\\(','\\)','\\/'),$name))));

$fh = fopen('pdfmark.txt', 'w');
fputs($fh, "[/Title <{$name}> /Page 1 /OUT pdfmark\n");
fclose($fh);

$command = "gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf final.pdf pdfmark.txt; mv out.pdf final.pdf";
exec($command);

请注意十六进制格式的编码以及标题定义中的不同括号。

php - pdfmark：生成的 PDF 书签标题中的某些重音字符未正确显示

3 回答 3

Related

Reference