python - 尝试遍历多个 PDF 文件并将这些 PDF 的各个页面保存为图像

Question

我正在开发一个 python 项目，该项目需要我一个接一个地遍历存储在当前目录的一个名为 sample/ 的文件夹中的多个 pdf，并将这些 pdf 的各个页面作为图像保存在另一个名为 convert_images/ 的目录中。有人能帮我吗？所有 pdf 文件都是随机命名的，但具有“.pdf”扩展名。

score -1 · Accepted Answer

你可以用pdf2image

pip install pdf2image

    from pdf2image import convert_from_path
    pages = convert_from_path('pdf_file', 500)
    for page in pages:
        page.save('out.jpg', 'JPEG')

或者：

import pypdfium2 as pdfium

pdffile = 'path/to/your_doc.pdf'

# render multiple pages concurrently (in this case: all)
for image, suffix in pdfium.render_pdf(pdffile):
    image.save(f'output_{suffix}.jpg')

# render a single page (in this case: the first one)
with pdfium.PdfContext(pdffile) as pdf:
    image = pdfium.render_page(pdf, 0)
    image.save('output.jpg')

python - 尝试遍历多个 PDF 文件并将这些 PDF 的各个页面保存为图像

1 回答 1

Related

Reference