pdf - 将 ArialMT 用于阿拉伯文本，而不使用 PDFBox 嵌入字体

Question

我正在使用 Apache PDFBox 在页面上编写阿拉伯语文本而不嵌入字体。ArialMT 似乎普遍可用，因此 PDFBox 都可以工作，而 PDF 查看器不会对最终文档有任何问题；但是，我还没有设法发现可以使用但不会嵌入字体的代码策略。

注意：这完全可以通过 PDF 标准实现，我已经看到了这样的生成文档。

附录（进一步解释案例）

非嵌入字体的特定情况是我生成带有图像的文档并将不可见文本（例如通过 OCR 生成）放在图像顶部的情况。当符合 PDF/A 标准时，在这种情况下不需要嵌入字体，因为图像是文档光栅化的唯一来源。“标准 14”字体不包括阿拉伯语代码点，因此 PDFBox 需要引用另一种字体才能工作，但加载字体会使其嵌入。

score 1 · Accepted Answer

To elaborate on Tilman's comment,

Just because you can do something doesn't mean you should. There are computers that don't have much fonts and the result may be weird

They're entirely correct: don't do this, use subset embedding because different setups can have different versions of Arial all of which will resolve against the ArialMT identifier, but with completely different internal glyphIDs.

As PDFs point to glyphids, not 'letters', what looks like cake with your copy of Arial could —when encoded as glyphid array— end up being B^r( in a different version of Arial. And that even includes newer versions of Arial that you yourself might end up using a year from now: suddenly your PDF files are completely unusable even for you.

PDF should be stand-alone documents. If you want people to read your PDFs, use subset embedding for the fonts you used, even if they're supposedly "generally available". The only way to not embed a font is to make the document use only fonts from the predefined standard set of 14 fonts, which any PDF-spec compliant reader must come with in order to render content without font embeds. And notice that Arial is not in that list.

pdf - 将 ArialMT 用于阿拉伯文本，而不使用 PDFBox 嵌入字体

1 回答 1

Related

Reference