python - 如何在python中从银行对帐单PDF中读取数据？

翻译自：https://stackoverflow.com/questions/62633426 2020-06-29T07:44:20.107

1269 次

我必须从包含文本和表格的银行对帐单 PDF 中读取数据。

我尝试了一些通过堆栈溢出提供的解决方案，但大多数都出现错误。

从以下许多代码中，我得到了一个代码，但没有得到预期的结果。

from tika import parser

rawText = parser.from_file('icici.pdf')

rawList = rawText['content'].splitlines()

print(rawList)

将输出作为 -

2020-06-29 13:05:31,177 [MainThread  ] [WARNI]  Failed to see startup log message; retrying...
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'Statement_MAY2020_013625568.pdf', '', '', '346001506028??PAVA0101 444501', '', '', '', '']

但是想要来自 PDF 文件的数据，而不是关于 PDF 文件的数据。

有人可以为我提供从银行对帐单 PDF 中读取数据的解决方案吗？

python - 如何在python中从银行对帐单PDF中读取数据？

0 回答 0

Related

Reference