0

I am running tesseract on windows 11 using the command prompt.

The text file is my training data. Words that I want to turn into images. The output is the next step in the Tesseract process for training my font. I am saying find fonts but I only have one font in the folder.

text2image --text="C:\PythonProjects\DiabloTesseractTrainFont\text.txt" --outputbase="C:\PythonProjects\DiabloTesseractTrainFont\Output\Dia.font.exp0" --fontconfig_tmpdir="C:\PythonProjects\DiabloTesseractTrainFont" --find_fonts --fonts_dir="C:\PythonProjects\DiabloTesseractTrainFont\Diablo Fonts"

The result: Total chars = 223645 Font Exocet Light failed with 223518 hits = 99.94%

Not sure why it fails. I have built something similar to this before. I have tried with a font file that I know has worked and it does the exact same thing.

Any help would be appreciated.

4

1 回答 1

0

我解决了。在文本文件中,当我将它们读入 python 时,有一些字符已被更改。我相信它们曾经是要点,但是当我阅读我用 python ASCII 编码实现的文件并忽略错误时。我认为这些字符将被删除。我错了。这些项目符号被替换为 PAD 的文本。我在记事本++中找到它并突出显示其中一个,然后用空格替换它们。请注意,当我进行替换时,Notepad++ 中的查找字段中没有任何内容,但它仍然替换了所有内容。现在它编译得很好。我被困了好几个小时,希望这对某人有所帮助。

于 2022-02-12T23:07:24.477 回答