我正在尝试使用制表符阅读银行对帐单的第二页或第三页。这是当前代码:
import tabula
import pandas as pd
from PyPDF2 import PdfFileReader
import re
pdf_path3 = "test.pdf"
dfs = tabula.read_pdf(pdf_path3, pages = "all")
tabula.convert_into("test.pdf", "output.csv", output_format="csv", pages='3')
pdf_path=r"test.pdf"
with open(pdf_path, 'rb') as f:
pdf = PdfFileReader(f)
information = pdf.getDocumentInfo()
number_of_pages = pdf.getNumPages()
print(information, number_of_pages)
df = pd.read_csv("output.csv", thousands=",", sep=' ')
df
但结果是:
需要将该日期交易行作为列标题读取。
这是pdf文件格式: