我正在尝试用 Python 编写代码,以使用 Tesseract-OCR 进行手动图像预处理和识别。
手动处理:
为了手动识别单个图像的文本,我使用 Gimp 预处理图像并创建 TIF 图像。然后我将它提供给正确识别它的 Tesseract-OCR。
要使用 Gimp 预处理图像,我会这样做 -
- 将模式更改为 RGB / 灰度
菜单 -- 图像 -- 模式 -- RGB - 阈值
菜单 -- 工具 -- 颜色工具 -- 阈值 -- 自动 - 将模式更改为索引
菜单 -- 图像 -- 模式 -- 索引 - Resize / Scale to Width > 300px
Menu -- Image -- Scale image -- Width=300 - 另存为 Tif
然后我喂它 tesseract -
$ tesseract captcha.tif output -psm 6
我总是得到一个准确的结果。
Python 代码:
我尝试使用 OpenCV 和 Tesseract 复制上述过程 -
def binarize_image_using_opencv(captcha_path, binary_image_path='input-black-n-white.jpg'):
im_gray = cv2.imread(captcha_path, cv2.CV_LOAD_IMAGE_GRAYSCALE)
(thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# although thresh is used below, gonna pick something suitable
im_bw = cv2.threshold(im_gray, thresh, 255, cv2.THRESH_BINARY)[1]
cv2.imwrite(binary_image_path, im_bw)
return binary_image_path
def preprocess_image_using_opencv(captcha_path):
bin_image_path = binarize_image_using_opencv(captcha_path)
im_bin = Image.open(bin_image_path)
basewidth = 300 # in pixels
wpercent = (basewidth/float(im_bin.size[0]))
hsize = int((float(im_bin.size[1])*float(wpercent)))
big = im_bin.resize((basewidth, hsize), Image.NEAREST)
# tesseract-ocr only works with TIF so save the bigger image in that format
tif_file = "input-NEAREST.tif"
big.save(tif_file)
return tif_file
def get_captcha_text_from_captcha_image(captcha_path):
# Preprocess the image befor OCR
tif_file = preprocess_image_using_opencv(captcha_path)
# Perform OCR using tesseract-ocr library
# OCR : Optical Character Recognition
image = Image.open(tif_file)
ocr_text = image_to_string(image, config="-psm 6")
alphanumeric_text = ''.join(e for e in ocr_text)
return alphanumeric_text
但我没有得到同样的准确性。我错过了什么?
更新 1:
更新 2:
此代码可在https://github.com/hussaintamboli/python-image-to-text获得