我正在使用 tabula-py 从 pdf 文档中提取表格,如下所示:
rows = tabula.read_pdf('bank_statement.pdf', pandas_options={"header":[0, 1, 2, 3, 4, 5]}, pages='all', stream=True, lattice=True)
rows
这给出了这样的输出:
[ 0
0 Customer Statement\rxxxxxxx\rP...
1 Print Date: April 12, 2020Address: 41 BAALE ST...
2 Period: January 1, 2020 April 12, 2020Openin...,
0
0 Customer Statement\xxxxxxxx\rP...
1 Print Date: April 12, 2020Address: 41 gg ST...,
0 1 2 3 4 5 \
0 03Jan2020 0 03Jan2020 NaN 50,000.00 52,064.00
1 10Jan2020 0 10Jan2020 25,000.00 NaN 27,064.00
2 10Jan2020 0 10Jan2020 25.00 NaN 27,039.00
3 10Jan2020 0 10Jan2020 1.25 NaN 27,037.75
4 20Jan2020 999921... 20Jan2020 10,000.00 NaN 17,037.75
5 23Jan2020 999984... 23Jan2020 4,050.00 NaN 12,987.75
6 23Jan2020 0 23Jan2020 1,000.00 NaN 11,987.75
7 24Jan2020 0 24Jan2020 2,000.00 NaN 9,987.75
8 24Jan2020 0 24Jan2020 NaN 30,000.00 39,987.75
6
0 TRANSFER BETWEEN\rCUSTOMERS Via GG from\r...
1 NS Instant Payment Outward\r000013200110121...
2 COMMISSION\r0000132001101218050000326...\rNIP ...
3 VALUE ADDED TAX VAT ON NIP\rTRANSFER FOR 00001
4 CASH WITHDRAWAL FROM\rOTHER ATM 210674 4420...
5 POS/WEB PURCHASE\rTRANSACTION 845061\r80405...
6 Airtime Purchase MBANKING\r101CT0000000001551...
7 Airtime Purchase MBANKING\r101CT0000000001552...
8 TRANSFER BETWEEN\rCUSTOMERS\r00001520012412113... ,
我想从这个 pdf 中得到什么从索引 2 开始。所以我运行
rows[2]
我得到一个看起来像这样的数据框:
现在,我想要从 2 到最后一个索引的索引。我做了
rows[2:]
但我得到的是一个列表,而不是预期的数据框。
[ 0 1 2 3 4 5 \
0 03Jan2020 0 03Jan2020 NaN 50,000.00 52,064.00
1 10Jan2020 0 10Jan2020 25,000.00 NaN 27,064.00
2 10Jan2020 0 10Jan2020 25.00 NaN 27,039.00
3 10Jan2020 0 10Jan2020 1.25 NaN 27,037.75
4 20Jan2020 999921... 20Jan2020 10,000.00 NaN 17,037.75
5 23Jan2020 999984... 23Jan2020 4,050.00 NaN 12,987.75
6 23Jan2020 0 23Jan2020 1,000.00 NaN 11,987.75
7 24Jan2020 0 24Jan2020 2,000.00 NaN 9,987.75
8 24Jan2020 0 24Jan2020 NaN 30,000.00 39,987.75
6
0 TRANSFER BETWEEN\rCUSTOMERS Via gg from\r...
1 bi Instant Payment Outward\r000013200110121...
2 COMMISSION\r0000132001101218050000326...\rNIP ...
3 VALUE ADDED TAX VAT ON NIP\rTRANSFER FOR 00001
4 CASH WITHDRAWAL FROM\rOTHER ATM 210674 4420...
5 POS/WEB PURCHASE\rTRANSACTION 845061\r80405...
请问我解决这个问题吗?我需要从 2 开始的索引数据框。