python - what causes "insufficient data for image" in a pdf

Question

I have a program in Python (using pyPDF) that merges a bunch of different PDF documents. Sometimes, the resulting pdf is fine, except for some blank pages in the middle. When I view these documents with Acrobat Reader, I get an error message saying "insufficient data for image". When I view the documents with FoxIT Reader, I get some blank pages and a munged image.

The only odd thing about the PDF that creates the blank pages is that it seems to be PDF Version 1.4, and PyPdf seems to create files with PDF Version 1.3.

1) Does the version thing sound like the root cause of my problem?

2) Is there a way to get PyPdf to handle this correctly?

score 2 · Accepted Answer

I had this problem, and was able to figure it out by looking at the original pdf side by side with the PyPDF one in a hex editor.

The problem seems to be that PyPDF actually leaves off a byte - it looks like probably the first byte in each image stream is missing. When I added the bytes to the PyPDF file, the pdf opened up fine without the error.

score 2 · Accepted Answer

这可能与 Windows 实际上不是 .pdf 文件有关。

http://support.microsoft.com/kb/2506795

祝你好运！

score 1 · Accepted Answer

我怀疑图像 XObject 流格式错误。如果无法访问有问题的 PDF，大多数人所能做的就是猜测。

例如，如果 pdf 信息显示图像宽 10 像素，高 10 像素，每像素 8 位，则流应解压缩为 100 字节。如果它未压缩到小于该值，我预计会出现您所看到的错误。

关于您碰巧使用的任何图像格式，这可能是 pypdf 中的一个错误。

IIRC，PDF 中没有扫描线填充，也没有字边界的问题，尽管如果需要，最后一位被填充到一个字节。那里的混乱很容易导致太多的字节，这不是这里的问题。

它也可能是一个糟糕的色彩空间。如果您有一个索引彩色图像 (gif)，并且他们将其转换为 RGB 图像的一半，但使用原始索引颜色字节，您将得到一个可能期望每像素 n*3 位的流，但只有每个像素有 n 位。

这可能是 pypdf 中修复的旧错误。您使用的是当前版本吗？

python - what causes "insufficient data for image" in a pdf

3 回答 3

Related

Reference