java - PDFRenderer - 导出到图像，导出不准确

Question

我编写了一个将 PDF 文件导出为一系列图像的程序，如下所示：

 //Load pdf from path(file)
    File file = new File("C:\\TEMP\\office\\a.pdf");
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    byte[] b = new byte[(int) raf.length()];
    raf.readFully(b);
    ByteBuffer buf = ByteBuffer.wrap(b);
    PDFFile pdffile = new PDFFile(buf);



    //Get number of pages

    int numOfPages = pdffile.getNumPages();
    System.out.println(numOfPages);
    //iterate through the number of pages
    for (int i = 1; i <= numOfPages; i++) {
        PDFPage page = pdffile.getPage(i);
        //Create new image
        Rectangle rect = new Rectangle(0, 0, (int) page.getBBox().getWidth(), (int) page.getBBox().getHeight());
        Image img = page.getImage(rect.width, rect.height, rect, null, true, true);
        BufferedImage bufferedImage = new BufferedImage(rect.width, rect.height, BufferedImage.TYPE_INT_RGB);
        Graphics g = bufferedImage.createGraphics();
        g.drawImage(img, 0, 0, null);
        g.dispose();

        File asd = new File("C:\\TEMP\\office\\img\\Testingx" + i + ".jpg");
        if (asd.exists()) {
            asd.delete();
        }
//Export the image to jpg format using the path C:\TEMP\office\img\filename
        ImageIO.write(bufferedImage, "jpg", asd);
    }
//Close the buf and other stuffs, which does not affect the image exported

这个程序在很多 PDF 文件中都可以正常工作，但是，当我使用在互联网上找到的各种 pdf 测试我的程序时，有一个 pdf 不能像其他人一样准确地导出为图像，我使用的资源如下所列。

原始 PDF 链接： 2007_OReilly_EssentialActionScript3.0.pdf

我将使用上面给出的 PDF 的第 7 页。

要导出的预期图像：单击此处查看预期结果图像

程序完成操作后，得到的图像完全不同。

单击此处查看结果图像

如您所见，结果图像向上移动，一些内容消失了，结果图像丢失了 pdf 中的格式，它没有居中，它向右缩进。

PDFrenderer 本身没有问题，如果我们运行 PDFrenderer 的 .jar 文件，顶部和格式与原始 PDF 文件一致。

在第 7 页中使用 PDFRenderer 打开的 PDF

已知可能问题：ImageIO 不支持 CMYK 格式，因此第 1 页和其他涉及使用 CMYK 格式的页面将无法正确导出。不确定我是否正确。

另一个问题：PDFRenderer 似乎无法阅读第 1 页，这可能是由于 PDF 格式中使用的某些内容，我对此不太了解

使用的库：PDFRenderer

您可以从上述链接下载 PDF 并使用我提供的程序来重现问题。

我的问题：我该如何解决这个问题？我的程序有问题吗？

score 1 · Accepted Answer

我自己发现了问题，我能够修复它。

解释如下

我的JAVA程序不遵循pdf文件中的“X”坐标和“Y”坐标，简单来说，我的程序硬编码了X，Y坐标。在大多数情况下，大多数 pdf 将像下图一样工作

大多数 PDF http://img266.imageshack.us/img266/7618/4cl5.png

但是，我提供的 pdf 不是这种情况，左上角的 X 坐标不是 0 ，所以是 Y。这就是图像被截断的原因。

简而言之，我的程序将捕获矩形形状的 PDF 屏幕，但是由于我上面提供的 PDF 没有找到左上角的坐标，所以它会像下图一样捕获屏幕。Y坐标没有写在图片中，我的错。

例外 PDF http://img12.imageshack.us/img12/9672/plhb.png

通过对程序进行以下修改，它将像大多数情况一样工作，而且效果更好。

矩形 rect = new Rectangle((int)page.getPageBox().getX(), (int)page.getPageBox().getY(), (int) page.getBBox().getWidth(), (int) page. getBBox().getHeight());

这允许程序从左上角开始“捕获” PDFRenderer 提供的整个 pdf，就像我提供的第一张图片一样，即使在从 A4 到 A7 的不同页面尺寸下，它也可以正常工作，我没有进一步测试，但它有效

java - PDFRenderer - 导出到图像，导出不准确

1 回答 1

Related

Reference