windows - mftraining 给出警告：在 CreateIntTemplates() 中没有 F 的原型/配置

Question

编辑： mftraining 在标题中为 unicharset 中的所有字符提供警告（不仅是 F，还有 a、b、c、d 等）我如何创建这些原型/配置？

我正在关注本教程

现在解决的上一个问题：-
错误：断言失败警告：在文件 ....\classify\trainingsampleset.cpp，第 622 行没有原型/分段错误
这是整个命令 + 输出：-

C:\training>mftraining -F font_properties -U unicharset -O eng.unicharset eng.impact.box.tr 警告：不存在形状表文件：shapetable 正在读取 eng.impact.box.tr ... 字体 id = -1/ 0，样本 0 上的类 id = 1/103 font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file....\classify\trainingsampleset.cpp，第 622 行

对于 unicharset 中的所有字符（实际上并不多），我已经查看了标题中此警告中可以找到的所有内容，因此不仅是 F，还有 a、b、c、d 等）如何我无法弄清楚问题是什么以及什么可以使它起作用。创建这些原型/配置？

我也尝试了 shapeclustering 命令，但这给了我同样的错误。此外，当我在 cygwin 上运行这些时，它会显示分段错误而不是断言错误。

score 3 · Accepted Answer

I was having the same problem, and it was indeed a problem with font_properties. However, in my case, it was solved by making sure that the font in font_properties matched exactly the font name in the .tr file. In my case, that was [fontname].exp0.

score 2 · Accepted Answer

我发现了这个问题的两个可能原因。

可能原因1：不正确的font_properties

font_properties 文件应包含以下描述的内容：

https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02#font_properties-new-in-301

文件编码应满足以下要求：

https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02#requirements-for-text-input-files

这是互联网上最常见的答案。

（还要确保在 font_properties 中指定字体而不是语言。）

可能原因2：训练文件名错误

但是我发现尝试修复 font_properties 对我不起作用，并发现了另一个在我的案例中给出相同错误的原因。

文件 .tr 文件必须包含以下格式：

<language>.<fontname>.exp<num>.tr

并不是：

<language>.<fontname>.exp<num>.box.tr

（在一些教程中可以看到）

所以在我的情况下，这将不起作用：

tesseract eng.unknown.exp1.png eng.unknown.exp1.box nobatch box.train
unicharset_extractor eng.unknown.exp1.box
mftraining -F font_properties -U unicharset -O eng.unicharset eng.unknown.exp1.box.tr

而这个小小的改变确实有效：

tesseract eng.unknown.exp1.png eng.unknown.exp1 nobatch box.train
unicharset_extractor eng.unknown.exp1.box
mftraining -F font_properties -U unicharset -O eng.unicharset eng.unknown.exp1.tr

score 2 · Accepted Answer

我和你有同样的问题。这是因为 font_properties 的格式不正确。

font_properties 文件的每一行格式如下： fontname italic bold fixed serif fraktur

这里只需要字体名。当我将文件从 lang.fontname.exp0 0 0 0 0 0 更改为 fontname 0 0 0 0 0 时，我的问题已解决

score 1 · Accepted Answer

我有同样的问题和改变

 fontname 0 0 0 0 0

至

 fontname.exp0 0 0 0 0 0

根据 .tr 文件中的字体名称修复它

score 1 · Accepted Answer

1

您错过了一个 shapeclustering 步骤，这是Tesseract 3.02 training中的新功能。

于 2012-12-25T05:49:03.010 回答

score 0 · Accepted Answer

我有同样的问题，并改变font_properties如下修复它：

从 - batangche 1 0 0 0 0

至 - batangche.exp0 1 0 0 0 0

score 0 · Accepted Answer

在我的例子中，font_properties 文件中的字体名称是大写的，而 .tr 文件中的字体名称是小写的。将它们更改为相同的情况解决了问题。

windows - mftraining 给出警告：在 CreateIntTemplates() 中没有 F 的原型/配置

7 回答 7

可能原因1：不正确的font_properties

可能原因2：训练文件名错误

Related

Reference