0

当尝试使用 PyPDF2 从 pdf 中获取数字时,我得到:

KeyError:'/内容'。这是代码:

import PyPDF2 as pdf    
fhand = open('filepdf.pdf', 'rb')
reader = pdf.PdfFileReader(fhand)
if reader.isEncrypted == True:
    pass
else:
    for i in range(reader.getNumPages()):
        for word in reader.getPage(i).extractText().split():
            if word.isdigit():
                print(word)

该代码适用于其他 pdf 文件,这是回溯:

Traceback (most recent call last):
  File "C:\Users\Root\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Root\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "c:\Users\Root\.vscode\extensions\ms-python.python-2020.4.76186\pythonFiles\lib\python\debugpy\no_wheels\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\Users\Root\.vscode\extensions\ms-python.python-2020.4.76186\pythonFiles\lib\python\debugpy\no_wheels\debugpy/..\debugpy\server\cli.py", line 430, in main
    run()
  File "c:\Users\Root\.vscode\extensions\ms-python.python-2020.4.76186\pythonFiles\lib\python\debugpy\no_wheels\debugpy/..\debugpy\server\cli.py", line 267, in run_file
    runpy.run_path(options.target, run_name=compat.force_str("__main__"))
  File "C:\Users\Root\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 263, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\Root\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\Root\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "c:\Users\Root\Desktop\test\test.py", line 9, in <module>
    for word in reader.getPage(i).extractText().split():
  File "C:\Users\Root\AppData\Local\Programs\Python\Python38-32\lib\site-packages\PyPDF2\pdf.py", line 2593, in extractText
    content = self["/Contents"].getObject()
  File "C:\Users\Root\AppData\Local\Programs\Python\Python38-32\lib\site-packages\PyPDF2\generic.py", line 516, in __getitem__
    return dict.__getitem__(self, key).getObject()
KeyError: '/Contents'
4

1 回答 1

0

对我来说 pdfminer 工作,pypdf2 最初给出错误

pdf_file = open(file, 'rb')
output_string = StringIO()
with open(file, 'rb') as in_file:
    parser = PDFParser(in_file)
    doc = PDFDocument(parser)
    # print(doc)
    rsrcmgr = PDFResourceManager()
    device = TextConverter(rsrcmgr, output_string, laparams=LAParams())
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    for page in PDFPage.create_pages(doc):
        interpreter.process_page(page)
        string = output_string.getvalue()
        string = re.sub('\n','',string)
        string = re.sub('  +',' ',string)    
    print(string)
于 2020-07-02T10:05:37.520 回答