1

我正在使用 camelot 从 pdf 文档中提取表格。该表具有日期、描述、贷方、借方和余额字段。描述字段有时有很长的行,延伸到下一行。如果我使用 camelot,它会打印如下所示的行:

 Transaction Date                                 Description       Debit        Credit     Balance
2         01/11/2020                                     BAL B/F                38,485.30  38,485.30
3         02/11/2020                                               20,000.00                18,485.30
4                                            MB X WITHDRAWAL
5                     Ref. MP:V TO X NO:5MP:V TO
6                                               X NO:9
7         04/11/2020                         MB X WITHDRAWAL   20,000.00                98,485.30
8                     Ref. MP:V TO X NO:40MP:V TO
9                                               X NO:47
10        05/11/2020                         MB X WITHDRAWAL   80,000.00                18,485.30

我希望表格以这样一种方式出现,即“描述”字段下的行如果延伸到下一行,则组合成一行,例如:

Transaction Date                                 Description           Debit        Credit     Balance
2  01/11/2020                                     BAL B/F                           38,485.30  38,485.30
3  02/11/2020    MB X WITHDRAWAL Ref. MP:V TO X NO:5MP:V TO X NO:9    20,000.00                18,485.30
                                            

这是我的代码:

tables = camelot.read_pdf('D:\\test.pdf', flavor='stream', edge_tol=500)
print(tables[0].df)

我如何实现这一目标?

4

0 回答 0