flying-saucer - 如果输入 xhtml 包含特殊字符，则使用飞碟生成 pdf 失败

Question

我正在使用飞碟将 xhtml 转换为 pdf。如果 xhtml 文件包含特殊字符，则 pdf 生成失败。特殊字符是指 ASCII 字符集之外的字符。下面写的是在生成 pdf 时失败的示例 xhtml（input.xhtml - ANSI 编码）。以下是我用来将 xhtml 转换为 pdf 的代码。

    String inputFile = "samples/input.xhtml";
    String url = new File(inputFile).toURI().toURL().toString();
    String outputFile = "output.pdf";
    OutputStream os = new FileOutputStream(outputFile);

    ITextRenderer renderer = new ITextRenderer();
    renderer.setDocument(url);
    renderer.layout();
    renderer.createPDF(os);
    os.close();

应该做些什么来确保 pdf 生成在任何情况下都不会失败？

下面写的是另一个 xhtml（input2.xhtml - UTF-8 编码）。它被成功转换为pdf。但是生成的pdf没有显示特殊字符Ɠ。为什么生成的 pdf 中不存在此字符？应该怎么做才能确保这些类型的字符出现在 pdf 中？

当输入 xhtml 中存在 NUL 字符 (U+0000) 时，pdf 生成也失败了？这是因为 xml 中不允许 NUL。如果 xhtml 中存在 NUL，是否仍可以使用飞碟生成 pdf？

输入.xhtml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>My First Document</title>
    <style type="text/css"> b { color: green; } </style>
</head>
<body>
    <p>
        <b>Greetings Earthlings! ü </b>
        We've come for your Java.
    </p>
</body>
</html>

input2.xhtml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>My First Document</title>
    <style type="text/css"> b { color: green; } </style>
</head>
<body>
    <p>
        <b>Greetings Earthlings! ü Ɠ </b>
        We've come for your Java.
    </p>
</body>
</html>

score 0 · Accepted Answer

关于问题的第一部分，字符Ɠ没有出现的原因是因为默认字体没有表示它。

如果你想打印它，你必须嵌入一个包含这个字符的字体，例如 Arial Unicode MS。

可以这样做：

  ITextRenderer renderer = new ITextRenderer();
  renderer.getFontResolver().addFont("ARIALUNI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

flying-saucer - 如果输入 xhtml 包含特殊字符，则使用飞碟生成 pdf 失败

1 回答 1

Related

Reference