我已在我的WSL上成功设置并运行 Kaldi Aspire 配方。现在我正在研究一个 POC,我想通过制作一个新的语料库、字典、语言模型并将其与原始 HCLG.fst 合并来扩展 ASPIRE 配方。我关注了这篇博文。我已经能够成功创建新字典、语言模型并合并输入文件。但是,当我尝试使用新的词典和语法重新编译 HCLG.fst 时出现以下错误。
Checking update-model/local/dict/silence_phones.txt ...
--> reading update-model/local/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/silence_phones.txt is OK
Checking update-model/local/dict/optional_silence.txt ...
--> reading update-model/local/dict/optional_silence.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/optional_silence.txt is OK
Checking update-model/local/dict/nonsilence_phones.txt ...
--> reading update-model/local/dict/nonsilence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/nonsilence_phones.txt is OK
Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.
Checking update-model/local/dict/lexicon.txt
--> reading update-model/local/dict/lexicon.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/lexicon.txt is OK
Checking update-model/local/dict/lexiconp.txt
--> reading update-model/local/dict/lexiconp.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/lexiconp.txt is OK
Checking lexicon pair update-model/local/dict/lexicon.txt and update-model/local/dict/lexiconp.txt
--> lexicon pair update-model/local/dict/lexicon.txt and update-model/local/dict/lexiconp.txt match
Checking update-model/local/dict/extra_questions.txt ...
--> reading update-model/local/dict/extra_questions.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/extra_questions.txt is OK
--> SUCCESS [validating dictionary directory update-model/local/dict]
fstaddselfloops update-model/dict/phones/wdisambig_phones.int update-
model/dict/phones/wdisambig_words.int
prepare_lang.sh: validating output directory
utils/validate_lang.pl update-model/dict
Checking existence of separator file
separator file update-model/dict/subword_separator.txt is empty or does not exist, deal in word case.
Checking update-model/dict/phones.txt ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/dict/phones.txt is OK
Checking words.txt: #0 ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/dict/words.txt is OK
Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK
Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> found no unexplainable phones in phones.txt
Checking update-model/dict/phones/context_indep.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 20 entry/entries in update-model/dict/phones/context_indep.txt
--> update-model/dict/phones/context_indep.int corresponds to update-model/dict/phones/context_indep.txt
--> update-model/dict/phones/context_indep.csl corresponds to update-model/dict/phones/context_indep.txt
--> update-model/dict/phones/context_indep.{txt, int, csl} are OK
Checking update-model/dict/phones/nonsilence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 156 entry/entries in update-model/dict/phones/nonsilence.txt
--> update-model/dict/phones/nonsilence.int corresponds to update-model/dict/phones/nonsilence.txt
--> update-model/dict/phones/nonsilence.csl corresponds to update-model/dict/phones/nonsilence.txt
--> update-model/dict/phones/nonsilence.{txt, int, csl} are OK
Checking update-model/dict/phones/silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 20 entry/entries in update-model/dict/phones/silence.txt
--> update-model/dict/phones/silence.int corresponds to update-model/dict/phones/silence.txt
--> update-model/dict/phones/silence.csl corresponds to update-model/dict/phones/silence.txt
--> update-model/dict/phones/silence.{txt, int, csl} are OK
Checking update-model/dict/phones/optional_silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in update-model/dict/phones/optional_silence.txt
--> update-model/dict/phones/optional_silence.int corresponds to update-model/dict/phones/optional_silence.txt
--> update-model/dict/phones/optional_silence.csl corresponds to update-model/dict/phones/optional_silence.txt
--> update-model/dict/phones/optional_silence.{txt, int, csl} are OK
Checking update-model/dict/phones/disambig.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 10 entry/entries in update-model/dict/phones/disambig.txt
--> update-model/dict/phones/disambig.int corresponds to update-model/dict/phones/disambig.txt
--> update-model/dict/phones/disambig.csl corresponds to update-model/dict/phones/disambig.txt
--> update-model/dict/phones/disambig.{txt, int, csl} are OK
Checking update-model/dict/phones/roots.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 43 entry/entries in update-model/dict/phones/roots.txt
--> update-model/dict/phones/roots.int corresponds to update-model/dict/phones/roots.txt
--> update-model/dict/phones/roots.{txt, int} are OK
Checking update-model/dict/phones/sets.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 43 entry/entries in update-model/dict/phones/sets.txt
--> update-model/dict/phones/sets.int corresponds to update-model/dict/phones/sets.txt
--> update-model/dict/phones/sets.{txt, int} are OK
Checking update-model/dict/phones/extra_questions.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 10 entry/entries in update-model/dict/phones/extra_questions.txt
--> update-model/dict/phones/extra_questions.int corresponds to update-model/dict/phones/extra_questions.txt
--> update-model/dict/phones/extra_questions.{txt, int} are OK
Checking update-model/dict/phones/word_boundary.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 176 entry/entries in update-model/dict/phones/word_boundary.txt
--> update-model/dict/phones/word_boundary.int corresponds to update-model/dict/phones/word_boundary.txt
--> update-model/dict/phones/word_boundary.{txt, int} are OK
Checking optional_silence.txt ...
--> reading update-model/dict/phones/optional_silence.txt
--> update-model/dict/phones/optional_silence.txt is OK
Checking disambiguation symbols: #0 and #1
--> update-model/dict/phones/disambig.txt has "#0" and "#1"
--> update-model/dict/phones/disambig.txt is OK
Checking topo ...
Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> update-model/dict/phones/word_boundary.txt doesn't include disambiguation symbols
--> update-model/dict/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> update-model/dict/phones/word_boundary.txt is OK
Checking word-level disambiguation symbols...
--> update-model/dict/phones/wdisambig.txt exists (newer prepare_lang.sh)
Checking word_boundary.int and disambig.int
sh: 2: export: (x86)/Intel/Intel(R): bad variable name
--> generating a 88 word/subword sequence
sh: 2: export: (x86)/Intel/Intel(R): bad variable name
--> ERROR: number of reconstructed words 0 does not match real number of words 88; indicates problem in L.fst or word_boundary.int. phoneseq = , wordseq = finches pei reservations rambo mommy courtship dawdling divas vox reorient boomtown whore protectorate hurt rayner topeka adamant mugs fouls birth a._k. stand discontents amazed laurels buttering sidetrack boundary lamport occasional suspicion shortcut melons until threats droppings tourette's greece boo competence fire's throat reimburse buffington waged griffith's meshes twiddling forecasting peters catastrophe tiptoe psychoanalysis statewide polar diluting bandit acronyms alvarez snatching nolte dreary fonder snacked navigate foolish severe barbara influenza shelled manuel adulterous antisocial army palace dollars whiff chalice paws injuries pop legume hyped invalids chide goodridge crappie raving
--> generating a 48 word/subword sequence
sh: 2: export: (x86)/Intel/Intel(R): bad variable name
Checking update-model/dict/oov.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in update-model/dict/oov.txt
--> update-model/dict/oov.int corresponds to update-model/dict/oov.txt
--> update-model/dict/oov.{txt, int} are OK
sh: 2: export: (x86)/Intel/Intel(R): bad variable name
--> ERROR: update-model/dict/L.fst is not olabel sorted
sh: 2: export: (x86)/Intel/Intel(R): bad variable name
--> ERROR: update-model/dict/L_disambig.fst is not olabel sorted
--> ERROR (see error messages above)
prepare_lang.sh: error validating output
我也在Kaldi 帮助小组上问过这个问题。Dan Povey 建议这可能是一个本地问题,其中可能会产生一个引发此错误的子外壳。
我的密码输出如下:-
/home/nitin/kaldi/egs/aspire/s5
我的path.sh如下:
export KALDI_ROOT=`pwd`/../../..
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard
file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit
1
. $KALDI_ROOT/tools/config/common_path.sh
export PATH=$KALDI_ROOT/tools/sctk/bin:$PATH
export LC_ALL=C
source ../../../tools/env.sh
在运行后续命令之前,需要获取链接博客文章中提到的我的cmd.sh是:
# "queue.pl" uses qsub. The options to it are
# options to qsub. If you have GridEngine installed,
# change this to a queue you have access to.
# Otherwise, use "run.pl", which will run jobs locally
# (make sure your --num-jobs options are no more than
# the number of cpus on your machine.
#a) JHU cluster options
export train_cmd="queue.pl"
export decode_cmd="queue.pl --mem 4G"
export mkgraph_cmd="queue.pl --mem 8G"
#b) BUT cluster options
#export train_cmd="queue.pl -q all.q@@blade -l ram_free=1200M,mem_free=1200M"
#export decode_cmd="queue.pl -q all.q@@blade -l ram_free=1700M,mem_free=1700M"
#export decodebig_cmd="queue.pl -q all.q@@blade -l ram_free=4G,mem_free=4G"
#export cuda_cmd="queue.pl -q long.q@@pco203 -l gpu=1"
#export cuda_cmd="queue.pl -q long.q@pcspeech-gpu"
#export mkgraph_cmd="queue.pl -q all.q@@servers -l ram_free=4G,mem_free=4G"
#c) run it locally...
#export train_cmd=run.pl
#export decode_cmd=run.pl
#export cuda_cmd=run.pl
#export mkgraph_cmd=run.pl
这里有任何 Linux 队长可以帮助我吗?