python - 如何获取 Tesseract ocr 检索到的字母坐标

Question

我正在尝试在 python 中处理 tesseract 来做简单的工作： - 打开图片 - 运行 ocr - 获取字符串 - 获取字符坐标

最后一个是我的痛！

这是我的第一个代码：

import tesseract
import glob
import cv2

api = tesseract.TessBaseAPI()
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZéèô%")
api.SetPageSegMode(tesseract.PSM_AUTO)

imagepath = "C:\\Project\\Bob\\"
imagePathList = glob.glob(imagepath + "*.jpg")

for image in imagePathList:
    mBuffer=open(imagePathList[10],"rb").read()
    result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
    img = cv2.imread(image)
    cv2.putText(img,result,(20,20), cv2.FONT_HERSHEY_PLAIN, 1.0,(0,255,0))       
    cv2.imshow("Original",img)
    cv2.waitKey()

由于我的图片有不同的布局，不同的位置有不同的单词，我想为每个字符设置一个框。

我见过谈论： - api.getBoxText - Hocr

但是没有找到在 Python 中实现它的方法。

score 3 · Accepted Answer

tesseract提供了访问几乎所有 tesseract 的 API 功能的能力。这是一个可能是您想要的示例：

from PIL import Image
from tesserocr import PyTessBaseAPI, RIL

image = Image.open('/usr/src/tesseract/testing/phototest.tif')
with PyTessBaseAPI() as api:
    api.SetImage(image)
    boxes = api.GetComponentImages(RIL.TEXTLINE, True)
    print 'Found {} textline image components.'.format(len(boxes))
    for i, (im, box, _, _) in enumerate(boxes):
        # im is a PIL image object
        # box is a dict with x, y, w and h keys
        api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
        ocrResult = api.GetUTF8Text()
        conf = api.MeanTextConf()
        print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
               "confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)

您还可以访问其他 API 方法，例如GetHOCRText等等GetBoxText。

但是，现在它只支持 *nix 系统，尽管用户在 Windows 上成功编译了它并提供了二进制文件，如果你想试试的话。

免责声明：tesserocr 作者在这里。

score 0 · Accepted Answer

0

GetHOCRText如果 Python 包装器支持它，您可能想要调用方法。

于 2013-09-27T13:20:21.697 回答

python - 如何获取 Tesseract ocr 检索到的字母坐标

2 回答 2

Related

Reference