7

Tesseract 常见问题解答中,他们说您可以:

如何获得每个角色的坐标和置信度?

有两种选择。如果您不想进入编程领域,可以使用 Tesseract 的 hocr 输出格式(详细信息请阅读 Tesseract 手册页)。

但是当我创建一个示例 hOCR 输出(它是一个 .html 文件)时,边界框和置信度仅在 word 级别可用。

我在这里错过了什么吗?

我添加了示例输入/输出作为插图(输入已调整大小)。


这是输入图像:

在此处输入图像描述


这是 Tesseract 的 hOCR 输出:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name='ocr-system' content='tesseract'/>
</head>
<body>
<div class='ocr_page' id='page_1' title='image "in2.tif"; bbox 0 0 1882 354'>
<div class='ocr_carea' id='block_1_1' title="bbox 78 59 457 100">
<p class='ocr_par'>
<span class='ocr_line' id='line_1_1' title="bbox 78 61 456 97"><span class='ocr_word' id='word_1_1' title="bbox 78 62 175 97"><span class='ocrx_word' id='xword_1_1' title="x_wconf -2">Dear</span></span> <span class='ocr_word' id='word_1_2' title="bbox 205 62 271 96"><span class='ocrx_word' id='xword_1_2' title="x_wconf -14">Mr:</span></span> <span class='ocr_word' id='word_1_3' title="bbox 303 61 456 97"><span class='ocrx_word' id='xword_1_3' title="x_wconf -2">Grover:</span></span></span>
</p>
</div>
<div class='ocr_carea' id='block_1_2' title="bbox 75 154 1842 317">
<p class='ocr_par'>
<span class='ocr_line' id='line_1_2' title="bbox 78 161 1787 210"><span class='ocr_word' id='word_1_4' title="bbox 78 161 111 196"><span class='ocrx_word' id='xword_1_4' title="x_wconf -2">If</span></span> <span class='ocr_word' id='word_1_5' title="bbox 137 161 270 205"><span class='ocrx_word' id='xword_1_5' title="x_wconf -2">you&#39;ve</span></span> <span class='ocr_word' id='word_1_6' title="bbox 298 162 393 197"><span class='ocrx_word' id='xword_1_6' title="x_wconf -1">been</span></span> <span class='ocr_word' id='word_1_7' title="bbox 422 161 571 206"><span class='ocrx_word' id='xword_1_7' title="x_wconf -3">looking</span></span> <span class='ocr_word' id='word_1_8' title="bbox 598 162 657 197"><span class='ocrx_word' id='xword_1_8' title="x_wconf -2">for</span></span> <span class='ocr_word' id='word_1_9' title="bbox 685 174 707 198"><span class='ocrx_word' id='xword_1_9' title="x_wconf -1">a</span></span> <span class='ocr_word' id='word_1_10' title="bbox 734 162 929 207"><span class='ocrx_word' id='xword_1_10' title="x_wconf -4">reporting</span></span> <span class='ocr_word' id='word_1_11' title="bbox 956 163 1031 198"><span class='ocrx_word' id='xword_1_11' title="x_wconf -1">tool</span></span> <span class='ocr_word' id='word_1_12' title="bbox 1059 162 1140 199"><span class='ocrx_word' id='xword_1_12' title="x_wconf -3">that</span></span> <span class='ocr_word' id='word_1_13' title="bbox 1168 164 1294 199"><span class='ocrx_word' id='xword_1_13' title="x_wconf -4">allows</span></span> <span class='ocr_word' id='word_1_14' title="bbox 1321 175 1428 200"><span class='ocrx_word' id='xword_1_14' title="x_wconf -1">users</span></span> <span class='ocr_word' id='word_1_15' title="bbox 1456 169 1494 200"><span class='ocrx_word' id='xword_1_15' title="x_wconf -3">to</span></span> <span class='ocr_word' id='word_1_16' title="bbox 1523 169 1649 200"><span class='ocrx_word' id='xword_1_16' title="x_wconf -2">create</span></span> <span class='ocr_word' id='word_1_17' title="bbox 1677 170 1787 210"><span class='ocrx_word' id='xword_1_17' title="x_wconf -3">great</span></span></span>
<span class='ocr_line' id='line_1_3' title="bbox 77 210 1841 260"><span class='ocr_word' id='word_1_18' title="bbox 77 210 226 256"><span class='ocrx_word' id='xword_1_18' title="x_wconf -3">looking</span></span> <span class='ocr_word' id='word_1_19' title="bbox 253 216 399 256"><span class='ocrx_word' id='xword_1_19' title="x_wconf -4">reports</span></span> <span class='ocr_word' id='word_1_20' title="bbox 427 211 581 256"><span class='ocrx_word' id='xword_1_20' title="x_wconf -3">quickly,</span></span> <span class='ocr_word' id='word_1_21' title="bbox 613 224 654 248"><span class='ocrx_word' id='xword_1_21' title="x_wconf -2">as</span></span> <span class='ocr_word' id='word_1_22' title="bbox 682 213 763 248"><span class='ocrx_word' id='xword_1_22' title="x_wconf -1">well</span></span> <span class='ocr_word' id='word_1_23' title="bbox 792 224 832 248"><span class='ocrx_word' id='xword_1_23' title="x_wconf -1">as</span></span> <span class='ocr_word' id='word_1_24' title="bbox 859 212 1056 258"><span class='ocrx_word' id='xword_1_24' title="x_wconf -4">providing</span></span> <span class='ocr_word' id='word_1_25' title="bbox 1083 212 1144 249"><span class='ocrx_word' id='xword_1_25' title="x_wconf -2">the</span></span> <span class='ocr_word' id='word_1_26' title="bbox 1173 214 1315 249"><span class='ocrx_word' id='xword_1_26' title="x_wconf -2">control</span></span> <span class='ocr_word' id='word_1_27' title="bbox 1344 215 1417 249"><span class='ocrx_word' id='xword_1_27' title="x_wconf -2">and</span></span> <span class='ocr_word' id='word_1_28' title="bbox 1445 214 1639 250"><span class='ocrx_word' id='xword_1_28' title="x_wconf -2">industrial</span></span> <span class='ocr_word' id='word_1_29' title="bbox 1667 215 1841 260"><span class='ocrx_word' id='xword_1_29' title="x_wconf -3">strength</span></span></span>
<span class='ocr_line' id='line_1_4' title="bbox 76 260 1370 306"><span class='ocr_word' id='word_1_30' title="bbox 76 261 243 296"><span class='ocrx_word' id='xword_1_30' title="x_wconf -2">features</span></span> <span class='ocr_word' id='word_1_31' title="bbox 272 260 353 297"><span class='ocrx_word' id='xword_1_31' title="x_wconf -2">that</span></span> <span class='ocr_word' id='word_1_32' title="bbox 381 273 427 297"><span class='ocrx_word' id='xword_1_32' title="x_wconf -1">an</span></span> <span class='ocr_word' id='word_1_33' title="bbox 458 261 499 297"><span class='ocrx_word' id='xword_1_33' title="x_wconf -2">IS</span></span> <span class='ocr_word' id='word_1_34' title="bbox 527 262 776 306"><span class='ocrx_word' id='xword_1_34' title="x_wconf -2">professional</span></span> <span class='ocr_word' id='word_1_35' title="bbox 804 263 1110 299"><span class='ocrx_word' id='xword_1_35' title="x_wconf -2">demands...look</span></span> <span class='ocr_word' id='word_1_36' title="bbox 1139 275 1184 299"><span class='ocrx_word' id='xword_1_36' title="x_wconf -1">no</span></span> <span class='ocr_word' id='word_1_37' title="bbox 1212 263 1370 299"><span class='ocrx_word' id='xword_1_37' title="x_wconf -3">further!</span></span></span>
</p>
</div>
</div>
</body>
</html>
4

2 回答 2

6

你已经看到了:它不存在。

因此,您可以修改 Tesseract 源代码以输出支持所需 x_confs 属性的 hOCR 格式,也可以使用其ResultIteratorAPI 类在字符(符号)级别获得信心(确保使用SetVariable("save_blob_choices", "T")afterInit方法)。

于 2013-04-09T05:03:08.080 回答
3

现在,这似乎在 Tesseract 4.x 中可用。

请参阅我的答案:

https://stackoverflow.com/a/57766860/1021819

在配置文件中将 hocr_char_boxes 设置为 1。或者,在命令行中,您更新的命令将是:

tesseract [图像名称] outputbase --oem 1 -l eng --psm 8 -c hocr_char_boxes=1 hocr 注意 hocr 输出选项并在该文件中查找 ..._wconf,例如

让我知道这是否适合您,否则我将删除答案。

来源: https ://github.com/tesseract-ocr/tesseract/issues/1465#issuecomment-513139976

于 2019-09-04T07:01:42.570 回答