我正在尝试生成 DeepSpeech-Polyglot-Project 的记分员。我已遵循文档的每一步,但是当我运行时:
python3 /DeepSpeech/data/lm/generate_lm.py --input_txt /DeepSpeech/data_prepared/texts/${LANGUAGE}/clean_vocab.txt --output_dir /DeepSpeech/data_prepared/texts/${LANGUAGE}/ --top_k 500000 --kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie --discount_fallback
Saving top 500000 words ...
Calculating word statistics ...
Your text file has 202185630 words in total
It has 2106729 unique words
Your top-500000 words are 98.7433 percent of all words
Your most common word "die" occurred 7853080 times
The least common word in your top-k is "adamantium" with 5 times
The first word with 6 occurrences is "begibst" at place 448270
Creating ARPA file ...
=== 1/5 Counting and sorting n-grams ===
Reading /DeepSpeech/data_prepared/texts/de/lower.txt.gz
Traceback (most recent call last):
File "/DeepSpeech/data/lm/generate_lm.py", line 210, in <module>
File "/DeepSpeech/data/lm/generate_lm.py", line 201, in main
build_lm(args, data_lower, vocab_str)
File "/DeepSpeech/data/lm/generate_lm.py", line 97, in build_lm
File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/DeepSpeech/native_client/kenlm/build/bin/lmplz', '--order', '5', '--temp_prefix', '/DeepSpeech/data_prepared/texts/de/', '--memory', '85%', '--text', '/DeepSpeech/data_prepared/texts/de/lower.txt.gz', '--arpa', '/DeepSpeech/data_prepared/texts/de/lm.arpa', '--prune', '0', '0', '1', '--discount_fallback']' died with <Signals.SIGSEGV: 11>.
我正在使用这个文档:https ://gitlab.com/Jaco-Assistant/deepspeech-polyglot