0

我将尝试描述这个过程。

  1. 使用以下代码在交互式 PDF 中使用值“123456789”填写字段“Textovépole60”并保存:
from PyPDF4 import PdfFileWriter, PdfFileReader
from PyPDF4.generic import BooleanObject, NameObject, IndirectObject

def set_need_appearances_writer(writer: PdfFileWriter):
    try:
        catalog = writer._root_object
        # get the AcroForm tree
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer

infile = "DOTAZNIK_ADULT.pdf"
outfile = "DOTAZNIK_ADULT_VYPLNENY.pdf"

inputStream = open(infile, "rb")
pdf = PdfFileReader(inputStream, strict=False)
if "/AcroForm" in pdf.trailer["/Root"]:
    pdf.trailer["/Root"]["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

pdf2 = PdfFileWriter()
set_need_appearances_writer(pdf2)
if "/AcroForm" in pdf2._root_object:
    pdf2._root_object["/AcroForm"].update(
        {NameObject("/NeedAppearances"): BooleanObject(True)})

field_dictionary = {"Textové pole60": "123456789"}

pdf2.addPage(pdf.getPage(0))
pdf2.updatePageFormFieldValues(pdf2.getPage(0), field_dictionary)

outputStream = open(outfile, "wb")
pdf2.write(outputStream)
inputStream.close()
outputStream.close()
  1. 然后,当我在 adobe reader 中打开 PDF 时,该值被填写在那里: Filled field

  2. 然后我想将页面从 PDF 转换为图像,但我没有在此处填写值运行脚本并使用以下代码在 spyder 中显示 pil_im

import pdf2image import pytesseract from pytesseract import Output

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

pdf_path = "DOTAZNIK_ADULT_VYPLNENY.pdf"

images = pdf2image.convert_from_path(pdf_path, poppler_path = 'C:\\Program Files\\Poppler\\bin')

pil_im = images[0]

请帮我!:) 谢谢

4

0 回答 0