python - 为什么 tesseract-ocr 无法检测到框中的文本？

翻译自：https://stackoverflow.com/questions/40084148 2016-10-17T10:35:20.917

1217 次

考虑这个实验：

我有两张图片，一张带有自由文本，另一张带有文本框（由边框包围）

如果我在这两个图像上运行 tesseract-ocr，自由文本图像输出“文本”，而装箱图像输出 Nothing ''

这是为什么？

作为修复，我可以使用一些图像处理来裁剪边框，但我想知道是什么导致了这个问题。

免费图片盒装图像

到目前为止，我使用以下逻辑裁剪了图像的边框[我们应该将其提供给外边框轮廓裁剪图像]，然后我就能够检测到文本。但是我不明白为什么 tesseract 没有检测到盒装文本。随意尝试附加的图像。

`# Below code modified (x,y) and (height,width) `
`# in a way that new values choose a smaller box enclosed`
`# by the original box`

 y = y + int(0.025*h) 
 x = x + int(0.025*w)
 h = h - int(0.05*h)
 w = w - int(0.05*w)

python - 为什么 tesseract-ocr 无法检测到框中的文本？

0 回答 0

Related

Reference