python-imaging-library - 来自图像的 Python OCR 文本

翻译自：https://stackoverflow.com/questions/35393589 2016-02-14T15:39:04.597

3580 次

我想从扫描的护照图像中提取数据。
我正在使用 PIL 进行图像处理过程，并使用 pytesseract 将图像转换为文本。
我的问题是我没有得到我需要的东西..我得到 5 而不是 S ..和类似的东西。
我认为问题不在于 pytesseract ，而在于 PIL ，因为我没有很好地过滤图像。
有人可以帮我从图像中提取，只有黑色像素吗？
或者，如果有人可以帮助我就我可以使用哪些健身器材来获得最佳效果提供建议。谢谢！我正在尝试这个：

#!/usr/bin/python
# -*- coding: utf-8 -*-
import pytesseract
import requests
from PIL import Image
from PIL import ImageFilter
from StringIO import StringIO

def process_image(url):
    image = _get_image(url)
    image = image.filter(ImageFilter.SHARPEN)
#    image = image.convert('1')
    print pytesseract.image_to_string(image)

def _get_image(url):
    return Image.open(StringIO(requests.get(url).content))

process_image('https://upload.wikimedia.org/wikipedia/commons/3/3f/Polish_passport_biodata_page.png')

python-imaging-library - 来自图像的 Python OCR 文本

0 回答 0

Related

Reference