python - 如何使用 Python 填写 PDF 表单

Question

我有一个PDF form使用Adobe LiveCycle Designer ES 10.4. 我需要使用它来填充它，Python以便我们可以减少体力劳动。我在网上搜索并阅读了一些文章，其中大部分都围绕pdfrw图书馆，我尝试使用它并从中提取了一些信息，PDF form如下所示

代码

from pdfrw import PdfReader
pdf = PdfReader('sample.pdf')
print(pdf.keys())
print(pdf.Info)
print(pdf.Root.keys())
print('PDF has {} pages'.format(len(pdf.pages)))

输出

['/Root', '/Info', '/ID', '/Size']
{'/CreationDate': "(D:20180822164509+05'30')", '/Creator': '(Adobe LiveCycle Designer ES 10.4)', '/ModDate': "(D:20180822165611+05'30')", '/Producer': '(Adobe XML Form Module Library)'}
['/AcroForm', '/MarkInfo', '/Metadata', '/Names', '/NeedsRendering', '/Pages', '/Perms', '/StructTreeRoot', '/Type']
PDF has 1 pages

我不确定我可以使用多远pdfrw来访问 PDF 表单中的可填写字段并使用Python是否可以填写它们。任何的意见都将会有帮助。

score 5 · Accepted Answer

您可以在此处找到表单字段：

pdf.Root.AcroForm.Fields

或在这里

pdf.Root.Pages.Kids[page_index].Annots

这是一个 PdfArray 对象。基本上是一个列表。该字段的名称可在此处找到：

pdf.Root.AcroForm.Fields[field_index].T

其他键包括值 .V 有一堆显示信息，如 .AP.N.Resources 下的字体等

但是，如果您更新字段的值并输出 pdf 文件。它可能仅在字段具有焦点（即单击）时显示该值。

我还没有想出如何解决这个问题。

score 1 · Accepted Answer

如果它们被索引，则使用它来填充每个字段。

template = PdfReader('template.pdf')
page_c = 0
while page_c < len(template.Root.Pages.Kids): #LOOP through pages
    annot_c = 0
    while annot_c < len(template.Root.Pages.Kids[page_c].Annots): #LOOP through fields
        template.Root.Pages.Kids[page_c].Annots[annot_c].update(PdfDict(V=str(annot_c)+'-'+str(page_c)))
        annot_c=annot_c+1
    page_c=page_c+1
PdfWriter().write('output.pdf', template)

score 1 · Accepted Answer

我写了一个基于：'pdfrw'、'pdf2image'、'Pillow'、'PyPDF2' 的库，称为 fillpdf （pip install fillpdf和 poppler 依赖项conda install -c conda-forge poppler）

基本用法：

from fillpdf import fillpdfs

fillpdfs.get_form_fields("blank.pdf")

# returns a dictionary of fields
# Set the returned dictionary values a save to a variable
# For radio boxes ('Off' = not filled, 'Yes' = filled)

data_dict = {
'Text2': 'Name',
'Text4': 'LastName',
'box': 'Yes',
}

fillpdfs.write_fillable_pdf('blank.pdf', 'new.pdf', data_dict)

# If you want it flattened:
fillpdfs.flatten_pdf('new.pdf', 'newflat.pdf')

更多信息在这里： https ://github.com/t-houssian/fillpdf

如果某些字段未填写，您可以使用 fitz ( pip install PyMuPDF) 和 PyPDF2 ( pip install PyPDF2)，如下所示根据需要更改点：

import fitz
from PyPDF2 import PdfFileReader

file_handle = fitz.open('blank.pdf')
pdf = PdfFileReader(open('blank.pdf','rb'))
box = pdf.getPage(0).mediaBox
w = box.getWidth()
h = box.getHeight()

# For images
image_rectangle = fitz.Rect((w/2)-200,h-255,(w/2)-100,h-118)
pages = pdf.getNumPages() - 1
last_page = file_handle[pages]
last_page._wrapContents()
last_page.insertImage(image_rectangle, filename=f'image.png')

# For text
last_page.insertText(fitz.Point((w/2)-247 , h-478), 'John Smith', fontsize=14, fontname="times-bold")
file_handle.save(f'newpdf.pdf')

score 0 · Accepted Answer

使用PDFix SDK的基于 AcroForm 的表单

def SetFormFieldValue(email, key, open_path, save_path):
    pdfix  = GetPdfix()
    if pdfix is None:
        raise Exception('Pdfix Initialization fail')
    if not pdfix.Authorize(pdfix_email, pdfix_license):
        raise Exception('Authorization fail : ' + pdfix.GetError())
    doc = pdfix.OpenDoc(open_path, "")
    if doc is None:
        raise Exception('Unable to open pdf : ' + pdfix.GetError())
    field = doc.GetFormFieldByName("Text1")
    if field is not None:
        value = field.GetValue()
        value = "New Value"
        field.SetValue(value)
    if not doc.Save(save_path, kSaveFull):
        raise Exception(pdfix.GetError())
    doc.Close()
    pdfix.Destroy()

score 0 · Accepted Answer

此处提供了完整的解决方案：如何使用 pdfrw 库编辑可编辑的 pdf？

关键部分是：

template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))

python - 如何使用 Python 填写 PDF 表单

5 回答 5

Related

Reference