0

无法完全提取 MBLHA10B\rGHH4258\r3,正如我们只能从 BLHA10B\rGHH4258\r3 看到的那样,它正在跳跃,我正在解开木器“M”

请参考此链接https://i.stack.imgur.com/vqL91.png

from tabula import read_pdf
path_s=r'try[enter image description here][1].pdf'
json_da = read_pdf(path_s, pages=1, output_format='json',silent=True,lattice=True)
Vehicle_Details1=[]
Vehicle_jsondata2 = json_da[0].get('data')
print('============================================================================================')
for i in range(len(Vehicle_jsondata2)):
    for j in range(len(Vehicle_jsondata2[i])):
        Vehicle_Details1.append(Vehicle_jsondata2[i][j].get('text'))
print(len(Vehicle_Details1))
print(Vehicle_Details1)
print('============================================================================================')

output:
     ['Registration\rNo.', 'Make', 'SubType', 'Model', 'CC/KW', 'Mfg year', 'Seat Cap', 'Vehicle/\rTrailer\rChassis\rNo', 'Engine Number', 'JH01CE1936', 'HERO MO-\rTOCORP', 'CAST KICK\rDRUM', 'PASSION PRO', '100', '2016', '2', 'BLHA10B\rGHH4258\r3', 'HA10EVGHH4653\r8']

Expected output:
   ['Registration\rNo.', 'Make', 'SubType', 'Model', 'CC/KW', 'Mfg year', 'Seat Cap', 'Vehicle/\rTrailer\rChassis\rNo', 'Engine Number', 'JH01CE1936', 'HERO MO-\rTOCORP', 'CAST KICK\rDRUM', 'PASSION PRO', '100', '2016', '2', 'MBLHA10B\rGHH4258\r3', 'HA10EVGHH4653\r8']

4

0 回答 0