8

I was hoping someone could tell me why it is my Tesseract has trouble recognizing some images with digits, and if there is something i can do about it. Everything is working according to test, and since it is only digits i need, i thought i could manage with the english pattern untill i had to start with the 7segmented display aswell.

Though i am having a lot of trouble with the appended images, i'd like to know if i should start working on my own recognition algorithms or if I could do my own datasets for Tesseract and then it would work, does anyone know where the limitation lies with Tesseract?

things tried: tried to set psm to one_line, one_word, one_char(and chop up the picture). With one_line and one_word there was no significant change. with one_char it did recognize a bit better, but sometimes, due to big spacing it attached an extra number to it, which then screwed it up, if you look at the attached image zero.jpg then it resulted in 04. I have also tried to do the binarization myself, this resulted in poorer recognition and was very rescource consuming. I have tried to invert the pictures, this makes no difference at all for tesseract.

I have attached the pictures i'd need, among others, to be processed.

Explaination about the images:

decodethisimage_seven is a image that the tesseract has no trouble recognizing, though it has been made in word for the conveniences of building an app around a working image.

decodethisimage_eight is real life image matching the image_seven. But it cannot recognize this.

decodethisimage_four2 is another image i'd like it to recognize, and yes i know it cant be skrewed, and i did unskrew(think skrew is the term here=="straighting") it when testing.

4

3 回答 3

2

Tesseract 不会为您进行分段。Tesseract 将在实际的 tesseract 算法之前对图像进行阈值处理。阈值处理后,图像中可能会保留一些边缘、伪影。

尝试手动将图像修改为黑白颜色,并查看 tesseract 作为输出返回的内容。

尝试阈值(自动)您的图像并查看 tesseract 作为输出返回的内容。阈值的输出可能太糟糕导致 tesseract 给出错误的输出。

由于阈值处理,您的第四张图像可能会失败(您有 3 种颜色:黑色背景、灰色背景和白色字母),并且阈值可能介于(黑色背景、灰色背景)之间。

一般来说,Tesseract 想要漂亮的黑白图像。可能需要对图像进行预处理以获得更好的结果。

对于您的第一张图像(结果为“04”),尝试查看框结果(char + 包含已识别字符的框的坐标)。“0”可能是一个小的人工制品——比如一个 4 x 4 的像素块。

于 2012-05-14T14:56:09.130 回答
2

我知道一些可能对您有所帮助的选项:

  1. 在图像边框和文本之间添加额外的空间。如果图像中的文本位于边缘,Tesseract 会很糟糕。
  2. 复制您的图像。例如,如果您对单词“foobar”执行 OCR,克隆图像并将“foobar foobar foobar foobar foobar”发送到 tesseract,结果会更好。
  3. Google forfont trainingimage binarizationfor tesseract。

请记住,移动设备中的内置摄像头大多会产生低质量的图像(模糊、噪点、歪斜等)。OCR 本身就是一个资源消耗过程,如果您在其中添加有价值的图像预处理,低端和中端移动设备设备(可能有 android)可能会面临出乎意料的性能下降甚至资源不足。这对于免费/研究项目来说是可以的,但如果你正在计划一个商业应用程序 - 考虑使用更好的 SDK。

有关详细信息,请查看此问题:OCR for android

于 2012-04-17T09:36:22.433 回答
1

你可以试试 javaocr(http://sourceforge.net/projects/javaocr/,是的,我是开发人员)

虽然没有正式发布,但您必须寻找来源(好消息:有工作的 android 示例,包括采样器、离线训练器和识别器应用程序)

如果你只有一种字体,你可以得到很好的结果(我在相同字体的数字上达到了 99.96 的识别率)

PS:它是纯java并且使用不变矩来执行匹配(所以缩放和旋转没有问题)。还有非常有效的二值化。

看看它的实际效果:

https://play.google.com/store/apps/details?id=de.pribluda.android.ocrcall&feature=search_result#?t=W251bGwsMSwxLDEsImRlLnByaWJsdWRhLmFuZHJvaWQub2NyY2FsbCJd

于 2012-04-17T09:55:25.557 回答