python - 在python中从PDF中提取所有表格

Question

我有一个 PDF，想从该 PDF 中提取所有表格。当我运行下面的代码时，我得到空列表。

import pdftables

filepath = 'File_Set_-2_feasibility_Study/140u-td005_-en-p.pdf'
with open(filepath, 'rb') as fh:
    table = pdftables.get_tables(fh)
print(table)

score 2 · Accepted Answer

我假设 PDF 有不止一页？这应该有效：

from pdftables.pdf_document import PDFDocument
from pdftables.pdftables import page_to_tables

filepath = ...
page_number = ...
with open(filepath, 'rb') as file_object:
    pdf_doc = PDFDocument.from_fileobj(file_object)
    pdf_page = pdf_doc.get_page(pagenumber) 
    tables = page_to_tables(pdf_page)
    print(tables)

您也可以遍历多个页面：

for page_number, page in enumerate(pdf_doc.get_pages()):
    tables = page_to_tables(page)
    print(tables)

score 0 · Accepted Answer

0

#install 下面的库以使用 pdf 表，它对我有用

> pip install pdftables.six

于 2021-05-18T09:31:21.263 回答

python - 在python中从PDF中提取所有表格

2 回答 2

Related

Reference