python - 在 Windows CMD 上尝试 Tesseract

Question

我在将 Tesseract-OCR 与 pytesseract Python 包装器一起使用时遇到问题。我认为问题可能来自 Tesseract 本身，而不是包装器。所以我在 CMD 中尝试了 Tesseract：

C:\Users\Thomas\Desktop>tesseract.exe 'blabla.jpg' 'out.txt'

它返回了以下几行：

Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Error in fopenReadStream: file not found
Error in findFileFormat: image file not found
Error during processing.

我已经完成了以下安装 Tesseract 的操作：

从那里安装：https ://github.com/UB-Mannheim/tesseract/wiki
将 tesseract.exe 的路径添加到 PATH 环境变量中

顺便说一句，我在运行 Python 代码时遇到的问题：

from PIL import Image
import pytesseract
text = pytesseract.image_to_string(Image.open('blabla.jpg')
print(text)

是：

Traceback (most recent call last):

  File "<ipython-input-1-01e77f902509>", line 1, in <module>
runfile('D:/anaconda/projects/OCR/ocr.py', wdir='D:/anaconda/projects/OCR')

  File "D:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)

  File "D:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/anaconda/projects/OCR/ocr.py", line 48, in <module>
text = pytesseract.image_to_string(a)

  File "D:\anaconda\lib\site-packages\pytesseract\pytesseract.py", line 122, in image_to_string
config=config)

  File "D:\anaconda\lib\site-packages\pytesseract\pytesseract.py", line 46, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)

  File "D:\anaconda\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)

  File "D:\anaconda\lib\subprocess.py", line 990, in _execute_child
startupinfo)

PermissionError: [WinError 5] Access refused

以管理员身份运行代码并不能解决问题

非常感谢你的帮助！

score 0 · Accepted Answer

首先，要验证是否tesseract来自Windows command prompt，请使用" "而不是' '图像和/或输出文件名是否包含space. 否则不需要引号。

C:\Users\Thomas\Desktop>tesseract.exe blabla.jpg out.txt

其次，使用完整的文件路径来指定图像文件。如，

pytesseract.pytesseract.tesseract_cmd = 'C:/path/to/tesseract.exe'
text = pytesseract.image_to_string(Image.open('D:/path/to/blabla.jpg'))

请注意，正斜杠/用于指定任何文件路径而不是反斜杠\，或者您使用双反斜杠\\，例如'D:\\path\\to\\blabla.jpg'。

希望这有帮助。

python - 在 Windows CMD 上尝试 Tesseract

1 回答 1

Related

Reference