java - 使用 PDFBox 复制 pdf 可以像使用 iText 一样小吗？

Question

我正在阅读 PDF 并输出包含多个原始 PDF 副本的 PDF。我通过对PDFBox和iText做同样的事情来进行测试。如果我单独复制每个页面，iText 创建的输出要小得多。

问题：在 PDFBox 中是否有另一种方法可以产生更小的输出 PDF。

对于一个示例输入文件，使用两种工具生成输出的两个副本：

原始 PDF 大小：30K
PDFBox (v 1.7.1) 生成的 PDF：84K
iText (v 5.3.4) 生成的 PDF：35K

PDFBox 的 Java 代码（抱歉给您造成错误处理）。请注意它如何一遍又一遍地读取输入并将其作为一个整体进行复制：

PDFMergerUtility merger = new PDFMergerUtility();
PDDocument workplace = null;
try {
    for (int cnt = 0; cnt < COPIES; ++cnt) {
        PDDocument document = null;
        InputStream stream = null;
        try {
            stream = new FileInputStream(new File(sourceFileName));
            document = PDDocument.load(stream);
            if (workplace == null) {
                workplace = document;
            } else {
                merger.appendDocument(workplace, document);
            }
        } finally {
            if (document != null && document != workplace) {
                document.close();
            }
            if (stream != null) {
                stream.close();
            }
        }
    }

    OutputStream out = null;
    try {
        out = new FileOutputStream(new File(destinationFileName));
        workplace.save(out);
    } finally {
        if (out != null) {
            out.close();
        }
    }
} catch (COSVisitorException e1) {
    e1.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
} finally {
    if (workplace != null) {
        try {
            workplace.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

使用 iText 执行此操作的代码。注意它是如何逐页加载输入文件并将每一页传输到输出的：

Document document = null;
PdfReader reader = null;
InputStream inputStream = null;
FileOutputStream outputStream = null;
try {
    inputStream = new FileInputStream(new File(sourceFileName));
    outputStream = new FileOutputStream(new File(destinationFileName));
    document = new Document();
    PdfCopy copy = new PdfSmartCopy(document, outputStream);
    document.open();
    reader = new PdfReader(inputStream);
    // loop over the pages in that document
    int pdfPageNo = reader.getNumberOfPages();
    for (int page = 0; page < pdfPageNo;) {
        PdfImportedPage onePage = copy.getImportedPage(reader, ++page);
        // duplicate each page N times
        for (int i = 0; i < COPIES; ++i) {
            copy.addPage(onePage);
        }
    }
    copy.freeReader(reader);
} catch (DocumentException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
} finally {
    if (reader != null) {
        reader.close();
    }
    if (document != null) {
        document.close();
    }
    try {
        if (inputStream != null) {
            inputStream.close();
        }
        if (outputStream != null) {
            outputStream.close();
        }
    } catch (IOException e) {
        // do nothing
    }
}

两者都被这个包围：

public class Duplicate {

    /** The original PDF file */
    private static final String sourceFileName = "PDF_CI_US2CA.pdf";

    /** The resulting PDF file. */
    private static final String destinationFileName = "itext_output.pdf";
    private static final int COPIES = 2;

    public static void main(String[] args) {
            ...
        }
}

score 9 · Accepted Answer

使用以下解决方案，我能够创建包含许多重复页面的 PDF 文件，并且对存储的影响最小。

PDDocument samplePdf = null;
try {
    samplePdf = PDDocument.load(PDF_PATH);
    PDPage page = (PDPage) samplePdf.getDocumentCatalog().getAllPages().get(0); 

    for(int i = 0; i < COPIES; i++) {
        samplePdf.importPage(page);
    }

    samplePdf.save(SAVE_PATH); //$NON-NLS-1$

} catch (IOException e) {
    e.printStackTrace();
} catch (COSVisitorException e) {
    e.printStackTrace();
}

在我的第一次尝试中，我使用了，samplePdf.addPage(page)但它没有按预期工作。add所以很明显和import函数是有区别的。我将不得不检查源代码或文档以了解原因。无论如何，这应该可以帮助您使用 PDFBox 为您的需求设计解决方案。

java - 使用 PDFBox 复制 pdf 可以像使用 iText 一样小吗？

1 回答 1

Related

Reference