我正在使用 blob trigger python azure function app 从 pdf 中提取数据,并且在使用 tabula py 时出现以下错误。我能够毫无问题地在本地运行它,但是,当我部署该功能时,我收到以下错误:
Result: Failure
Exception: JavaNotFoundError: `java` command is not found from this Python process.Please ensure Java is installed and PATH is set for `java`
Stack: File "/azure-functions-host/workers/python/3.6/LINUX/X64/azure_functions_worker/dispatcher.py", line 315, in _handle__invocation_request
self.__run_sync_func, invocation_id, fi.func, args)
File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/azure-functions-host/workers/python/3.6/LINUX/X64/azure_functions_worker/dispatcher.py", line 434, in __run_sync_func
return func(**params)
File "/home/site/wwwroot/Assessment/__init__.py", line 21, in main
pdfTable = tabula.read_pdf(blob_to_read,pages='all',multiple_tables=True)
File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/tabula/io.py", line 322, in read_pdf
output = _run(java_options, kwargs, path, encoding)
File "/home/site/wwwroot/.python_packages/lib/python3.6/site-packages/tabula/io.py", line 91, in _run
raise JavaNotFoundError(JAVA_NOT_FOUND_ERROR)
这是我的代码:
import logging
import azure.functions as func
import io
import re
import os
import tabula
def main(myblob: func.InputStream,blobout: func.Out[str],context: func.Context):
logging.info(f"--- Python blob trigger function processed blob \n"
f"----- Name: {myblob.name}\n"
f"----- Blob Size: {myblob.length} bytes")
inputblob = myblob.read()
blob_to_read = io.BytesIO(inputblob)
pdfTable = tabula.read_pdf(blob_to_read,pages='all',multiple_tables=True)
我也尝试过 camelot,但遇到了与 ghostscript 安装相关的并发症。
我正在制定消费计划。任何有关如何解决此问题的帮助将不胜感激。
谢谢你。