python - 如何使用python从图像中获取字母

Question

我想使用 python 从图像中捕获字母（字符和数字），请帮助我如何用任何示例代码解释我。

score 0 · Accepted Answer

我为此使用 tesseract。还有一个 Python 库：https ://code.google.com/p/python-tesseract/

来自主页的示例：

import tesseract
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_AUTO)

mImgFile = "eurotext.jpg"
mBuffer=open(mImgFile,"rb").read()
result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
print "result(ProcessPagesBuffer)=",result

这是我的 Python3 代码，不使用 tesseract 库，而是使用 .exe 文件：

import os
import tempfile

def tesser_exe():
    path = os.path.join(os.environ['Programfiles'], 'Tesseract-OCR', 'tesseract.exe')
    if not os.path.exists(path):
        raise NotImplementedError('You must first install tesseract from https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-setup-3.02.02.exe&can=2&q=')
    return path

def text_from_image_file(image_name):
    assert image_name.lower().endswith('.bmp')
    output_name = tempfile.mktemp()
    exe_file = tesser_exe() # path to the tesseract.exe file from 
    return_code = subprocess.call([exe_file, image_name, output_name, '-psm', '7'])
    if return_code != 0:
        raise NotImplementedError('error handling not implemented')
    return open(output_name + '.txt', encoding = 'utf8').read()

score 0 · Accepted Answer

如果您的图像清晰（噪点较少），我希望这会对您有所帮助。在这个案例中使用谷歌的“PyTesser”项目。
PyTesser 是 Python 的光学字符识别模块。它将图像或图像文件作为输入并输出一个字符串。您可以从此链接
获取 PyTesser 。这是一个例子：

>>> from pytesser import *
>>> image = Image.open('fnord.tif')  # Open image object using PIL
>>> print image_to_string(image)     # Run tesseract.exe on image
fnord
>>> print image_file_to_string('fnord.tif')
fnord

python - 如何使用python从图像中获取字母

2 回答 2

Related

Reference