python - 改进 tesseract 的图像预处理（视频游戏截图）

Question

我正在尝试阅读视频游戏中的价格文本，并且在预处理图像时遇到了困难。

我的其余代码是“完整的”，因为在提取文本后，我正在对其进行格式化并输出到 CSV 以供以后使用。

这是我迄今为止为以下图像提出的建议，并希望输入其他阈值或预处理工具，以使 OCR 更准确。

原始图像截图

伽玛之后，左边的去噪 - 右边的二进制阈值

检测到的文字

如您所见，它非常接近但并不完美。我想让它更准确，因为我最终会处理很多帧。

这是我当前的代码：

import cv2
import pytesseract
import pandas as pd
import numpy as np

# Tells pytesseract where the tesseract environment is installed on local computer
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

img = cv2.imread("./image_frames/frame0.png")

# gamma to darken text to be same opacity?
def adjust_gamma(crop_img, gamma=1.0):
    # build a lookup table mapping the pixel values [0, 255] to
    # their adjusted gamma values
    invGamma = 1.0 / gamma
    table = np.array([((i / 255.0) ** invGamma) * 255
        for i in np.arange(0, 256)]).astype("uint8")
    # apply gamma correction using the lookup table
    return cv2.LUT(crop_img, table)

adjusted = adjust_gamma(crop_img, gamma=0.15)

# grayscale the image
gray = cv2.cvtColor(adjusted, cv2.COLOR_BGR2GRAY)
# denoising image
dst = cv2.fastNlMeansDenoising(gray, None, 10, 10, 10)


# binary threshold
thresh = cv2.threshold(gray, 35, 255, cv2.THRESH_BINARY_INV)[1]


# OCR configurations (3 is default)
config = "--psm 3"

# Just show the image
cv2.imshow("before", gray)
cv2.imshow("before", dst)
cv2.imshow("thresh", thresh)
cv2.waitKey(0)

# Reads text from the image and prints to console
text = pytesseract.image_to_string(thresh, config=config)
# remove double lines
text = text.replace('\n\n','\n')
# remove unicode character
text = text.replace('', '')
print(text)

感谢任何帮助，因为我对此很陌生！

score 2 · Accepted Answer

步骤#1：缩放图像

步骤#2：申请adaptive-threshold

步骤#3：将 page-segmentation-mode ( psm) 设置为 6（假设一个统一的文本块。）

1缩放图像：

原因是为了看清楚图像，因为原始图像非常小。

img = cv2.imread("udQw1.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)

2申请adaptive-threshold

一般threshold是应用，但在你的图像中，应用threshold对结果没有影响。
对于不同的图像，您可能需要设置不同的C和block值。
例如第一张图片：

gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 15, 22)

结果：
例如第二张图片：

gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 51, 4)

结果：

3设置psm为 6，将图像假定为单个统一的文本块。

txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)

第一张图片的结果：

Dragon Claymore
1,388,888,888 mesos.
Maple Pyrope Spear
288,888,888 mesos.
Element Pierce
488,888,888 mesos.
Purple Adventurer Cape
97,777,777 mesos.

第二张图片的结果：

Ring of Alchemist
749,999,995 mesos.
Dragon Slash Claw
499,999,995 mesos.
"Stormcaster Gloves
149,999,995 mesos.
Elemental Wand 6
749,999,995 mesos.

Big Money Chalr

1 tor 249,999,985 mesos.|

第一张图片的代码：

import pytesseract
import cv2

img = cv2.imread("udQw1.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 15, 22)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)

第二张图片的代码：

import pytesseract
import cv2

img = cv2.imread("7Y2yx.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 51, 4)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)

链接

python - 改进 tesseract 的图像预处理（视频游戏截图）

1 回答 1

Related

Reference