image-processing - 使用 leptonica 进行 OCR 图像处理（反色文本）

Question

我正在尝试使用 leptonica 处理以下图像以使用 tesseract 提取文本。

原图： 原始图像

原始图像上的 Tesseract 产生以下结果：

i s l
D2J1FiiE-l191x1iitmwii9 uhiaiislz-2 Q ~37
Bottom linez
With a little time!
you can learn social media technology
using free online resources-
And if you donity
youlll be at a significant disadvantage
to
other HOn-pFOiiTS-

不是很好，尤其是顶部背景。所以使用 leptionica 我使用背景去除算法（模糊、差异、阈值、反转）来获得以下图像：处理后的图像

但是 tesseract 并没有很好地处理它：

@@r-mair lkrm@W lh@w ilr@ mJs@ iklh@ ii@c2lhm1@ll
mm Mime
VWU1 a Mitt-Jle time-
@1m ll@@Wn Om @@@lh1
using free onhne resources-
Andifyoudoni
9110 ate a $0 D
to other non-profrts
I

似乎主要的问题是现在所有的文本都是轮廓而不是实体的。我该如何调整我的算法或者我可以添加什么来使文本更牢固？

score 11 · Accepted Answer

似乎本文提出了一种解决您的问题的二值化方法：

T Kasar、J Kumar 和 AG Ramakrishnan。字体和背景颜色无关的文本二值化。(2007)

Kasar etal 方法性能

image-processing - 使用 leptonica 进行 OCR 图像处理（反色文本）

1 回答 1

Related

Reference