我正在尝试编写或多或少地解释 PDF 软证明的内容。
我想提取一些信息,但不知道如何提取。
我需要提取的内容:
Bleed: I got this somewhat working with pyPdf, given
that the document uses 72 dpi, which sadly isn't
always the case. I need to be able to calculate
the bleed in millimeters.
Print resolution (dpi): If I read the PDF spec[1] correctly this ought to
always be 72 dpi, unless a page has UserUnit set,
which was only introduced in PDF-1.6, but shouldn't
print documents always be at least 300 dpi? I'm
afraid that I misunderstood something…
I'd also need the print resolution for images, if
they can differ from the default page resolution,
that is.
Text color: I don't have the slightest clue on how to extract
this, the string 'text colour' only shows up once
in the whole spec, without any explanation how it
is set.
Image colormodel: If I understand it correctly I can read this out
in pyPdf with page['/Group']['/CS'] which can be:
- /DeviceRGB
- /DeviceCMY
- /DeviceCMYK
- /DeviceGray
- /DeviceRGBK
- /DeviceN
Font 'embeddedness': I read in another post on stackoverflow that I
can just iterate over the font resources and if a
resource has a '/FontFile'-key that means that
the font is embedded. Is this correct?
如果 pyPdf 以外的其他库能够更好地提取此信息(或它们的组合),那么它们将受到欢迎。到目前为止,我摸索着使用 pyPdf、pdfrw 和 pdfminer。所有这些都没有最广泛的文档。
[1] http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf