我正在使用烧瓶,我正在上传一个 pdf 文件以将其转换为图像并使用 pytesseract 执行 OCR。
但是,pdf2image 无法读取上传的图像。我尝试在互联网上搜索,但我找不到任何东西。
我尝试直接传递文件存储对象,但出现错误,我的代码如下所示:
log_file = request.files.get('pdf')
images = convert_from_path(log_file)
text = ""
for img in images:
im = img
ocr_dict = pytesseract.image_to_data(im, lang='eng', output_type=Output.DICT)
text += " ".join(ocr_dict['text'])
cleaned_text = clean_text(txt=text)
这给出了这个错误,
**TypeError: expected str, bytes or os.PathLike object, not FileStorage**
我也试过做,
log_file = request.files.get('pdf')
images = convert_from_path(log_file.read())
text = ""
for img in images:
im = img
ocr_dict = pytesseract.image_to_data(im, lang='eng', output_type=Output.DICT)
text += " ".join(ocr_dict['text'])
cleaned_text = clean_text(txt=text)
这给出了错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pdf2image/pdf2image.py", line 458, in pdfinfo_from_path
proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.8/subprocess.py", line 1639, in _execute_child
self.pid = _posixsubprocess.fork_exec(
ValueError: embedded null byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/usr/local/lib/python3.8/dist-packages/flask_restful/__init__.py", line 467, in wrapper
resp = resource(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/flask/views.py", line 84, in view
return current_app.ensure_sync(self.dispatch_request)(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/flask_restful/__init__.py", line 582, in dispatch_request
resp = meth(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/flask_httpauth.py", line 172, in decorated
return self.ensure_sync(f)(*args, **kwargs)
File "/home/ubuntu/Credit_Scoring/API_Script/temp2.py", line 38, in post
json_text = coi_ocr.get_coi_ocr_text()
File "/home/ubuntu/Credit_Scoring/API_Script/ocr_script/certificate_of_incorporation/coi_ocr_script_pdf.py", line 51, in get_coi_ocr_text
text1 = self.extract_text_from_COI()
File "/home/ubuntu/Credit_Scoring/API_Script/ocr_script/certificate_of_incorporation/coi_ocr_script_pdf.py", line 16, in extract_text_from_COI
images = convert_from_path(self.fl)
File "/usr/local/lib/python3.8/dist-packages/pdf2image/pdf2image.py", line 98, in convert_from_path
page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
File "/usr/local/lib/python3.8/dist-packages/pdf2image/pdf2image.py", line 489, in pdfinfo_from_path
"Unable to get page count.\n%s" % err.decode("utf8", "ignore")
UnboundLocalError: local variable 'err' referenced before assignment