python - 如何解决错误 pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

Question

我收到错误 pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it is not in your path。我在这个出现前几分钟测试了我的程序，它运行良好。然后我再次测试它，它一直显示这个错误。我不知道该怎么办。这是我的代码：

from PIL import ImageGrab
import cv2
import pytesseract
import numpy as np
from tkinter import Tk
from tkinter.filedialog import askopenfilename
ask = input("Do you want to ocr in realtime or choose a picture (r/p)?")
if ask == 'r':
    while True:
        screen = np.array(ImageGrab.grab(bbox=(700, 300, 1600, 1000)))
        # print('Frame took {} seconds'.format(time.time()-last_time))
        cv2.imshow('window', screen)
        if cv2.waitKey(25) & 0xFF == ord('q'):
            cv2.destroyAllWindows()
            break
        print(pytesseract.image_to_string(screen, lang='eng', config='--psm 6'))
if ask == 'p':
    Tk().withdraw()  # we don't want a full GUI, so keep the root window from appearing
    filename = askopenfilename()  # show an "Open" dialog box and return the path to the selected file
    print(pytesseract.image_to_string(filename, lang='eng', config='--psm 6'))

score 1 · Accepted Answer

There could be multiple problems for this issue.

Check If tesseract.exe is installed. If not get exe file from below link and install the same. Remember the installation path for future reference.

https://github.com/UB-Mannheim/tesseract/wiki

If you already have tesseract installed. But pytesseract is unable to access tesseract using python. You can set the path with in the script like this.

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

score 1 · Accepted Answer

我过去曾遇到过同样的问题，我认为您必须确保：

从这里安装
跑pip install pytesseract
在环境变量中添加一个名为“tesseract”的新变量，其值为

C:\Program Files (x86)\Tesseract-OCR\tesseract.exe
如果您在命令行中运行 tesseract 应该可以为您提供使用信息

而已：）

score 0 · Accepted Answer

您需要告诉 pytesseract tesseract 二进制文件的位置：

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'<full_path_to_your_tesseract_executable>'

这样做应该可以解决您的问题

score 0 · Accepted Answer

安装过程和经过训练的数据文件是最重要的。例如，阿拉伯语需要 ara.traindata 文件。我建议使用正确的语言模型和最新版本：

对于 Windows 10：

tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe（64位）resp。

要在 power shell 或 cmd 终端中验证安装，请执行：

tesseract -v

它将输出如下内容： tesseract v5.0.0-alpha.20200328

对于 Mac 操作系统：

brew install tesseract

要在 power shell 或 cmd 终端中验证安装，请执行：

tesseract -v

它将输出如下内容： tesseract 4.1.1 以及已安装的图像库 leptonica-1.80.0 libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1 找到 AVX2 找到 AVX 找到 FMA 找到 SSE

如果您不确定路径，只需将ara.traindata文件复制粘贴到与 Python .py 文件相同的文件夹中

import pytesseract
from PIL import Image
import os
os.environ["TESSDATA_PREFIX"] = "" # Leaving it empty because file is already copy pasted in the current directory
print(os.getenv("TESSDATA_PREFIX"))
# Copy paste the ara.traineddata file in the same directory as this python code
print(pytesseract.image_to_string(Image.open('cropped.png'), lang="ara"))

对于 Linux/Ubuntu 操作系统：

sudo apt-get install tesseract-ocr

验证和运行代码与Mac OS相同

还要确保路径是好的。

如果成功下载了 ara.traineddata 文件，则此代码可以正常工作：

import pytesseract
from PIL import Image
print(pytesseract.image_to_string(Image.open('cropped.png'), lang="ara"))

您可以按照本教程了解详细信息。这是本教程的演示输出，它使用了所有可用的语言。

python - 如何解决错误 pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

4 回答 4

对于 Windows 10：

对于 Mac 操作系统：

对于 Linux/Ubuntu 操作系统：

Related

Reference