linux - 扩展 Kaldi Aspire：使用新词典和语法文件重新编译 HCLG.fst 时出现变量错误

Question

我已在我的WSL上成功设置并运行 Kaldi Aspire 配方。现在我正在研究一个 POC，我想通过制作一个新的语料库、字典、语言模型并将其与原始 HCLG.fst 合并来扩展 ASPIRE 配方。我关注了这篇博文。我已经能够成功创建新字典、语言模型并合并输入文件。但是，当我尝试使用新的词典和语法重新编译 HCLG.fst 时出现以下错误。

Checking update-model/local/dict/silence_phones.txt ...
--> reading update-model/local/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/silence_phones.txt is OK

Checking update-model/local/dict/optional_silence.txt ...
--> reading update-model/local/dict/optional_silence.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/optional_silence.txt is OK

Checking update-model/local/dict/nonsilence_phones.txt ...
--> reading update-model/local/dict/nonsilence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt
--> disjoint property is OK.

Checking update-model/local/dict/lexicon.txt
--> reading update-model/local/dict/lexicon.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/lexicon.txt is OK

Checking update-model/local/dict/lexiconp.txt
--> reading update-model/local/dict/lexiconp.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/lexiconp.txt is OK

Checking lexicon pair update-model/local/dict/lexicon.txt and update-model/local/dict/lexiconp.txt
--> lexicon pair update-model/local/dict/lexicon.txt and update-model/local/dict/lexiconp.txt match

Checking update-model/local/dict/extra_questions.txt ...
--> reading update-model/local/dict/extra_questions.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/local/dict/extra_questions.txt is OK
--> SUCCESS [validating dictionary directory update-model/local/dict]

fstaddselfloops update-model/dict/phones/wdisambig_phones.int update- 
model/dict/phones/wdisambig_words.int
prepare_lang.sh: validating output directory
utils/validate_lang.pl update-model/dict
Checking existence of separator file
separator file update-model/dict/subword_separator.txt is empty or does not exist, deal in word case.
Checking update-model/dict/phones.txt ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/dict/phones.txt is OK

Checking words.txt: #0 ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> update-model/dict/words.txt is OK

Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
--> silence.txt and nonsilence.txt are disjoint
--> silence.txt and disambig.txt are disjoint
--> disambig.txt and nonsilence.txt are disjoint
--> disjoint property is OK

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
--> found no unexplainable phones in phones.txt

Checking update-model/dict/phones/context_indep.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 20 entry/entries in update-model/dict/phones/context_indep.txt
--> update-model/dict/phones/context_indep.int corresponds to update-model/dict/phones/context_indep.txt
--> update-model/dict/phones/context_indep.csl corresponds to update-model/dict/phones/context_indep.txt
--> update-model/dict/phones/context_indep.{txt, int, csl} are OK

Checking update-model/dict/phones/nonsilence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 156 entry/entries in update-model/dict/phones/nonsilence.txt
--> update-model/dict/phones/nonsilence.int corresponds to update-model/dict/phones/nonsilence.txt
--> update-model/dict/phones/nonsilence.csl corresponds to update-model/dict/phones/nonsilence.txt
--> update-model/dict/phones/nonsilence.{txt, int, csl} are OK

Checking update-model/dict/phones/silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 20 entry/entries in update-model/dict/phones/silence.txt
--> update-model/dict/phones/silence.int corresponds to update-model/dict/phones/silence.txt
--> update-model/dict/phones/silence.csl corresponds to update-model/dict/phones/silence.txt
--> update-model/dict/phones/silence.{txt, int, csl} are OK

Checking update-model/dict/phones/optional_silence.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in update-model/dict/phones/optional_silence.txt
--> update-model/dict/phones/optional_silence.int corresponds to update-model/dict/phones/optional_silence.txt
--> update-model/dict/phones/optional_silence.csl corresponds to update-model/dict/phones/optional_silence.txt
--> update-model/dict/phones/optional_silence.{txt, int, csl} are OK

Checking update-model/dict/phones/disambig.{txt, int, csl} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 10 entry/entries in update-model/dict/phones/disambig.txt
--> update-model/dict/phones/disambig.int corresponds to update-model/dict/phones/disambig.txt
--> update-model/dict/phones/disambig.csl corresponds to update-model/dict/phones/disambig.txt
--> update-model/dict/phones/disambig.{txt, int, csl} are OK

Checking update-model/dict/phones/roots.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 43 entry/entries in update-model/dict/phones/roots.txt
--> update-model/dict/phones/roots.int corresponds to update-model/dict/phones/roots.txt
--> update-model/dict/phones/roots.{txt, int} are OK

Checking update-model/dict/phones/sets.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 43 entry/entries in update-model/dict/phones/sets.txt
--> update-model/dict/phones/sets.int corresponds to update-model/dict/phones/sets.txt
--> update-model/dict/phones/sets.{txt, int} are OK

Checking update-model/dict/phones/extra_questions.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 10 entry/entries in update-model/dict/phones/extra_questions.txt
--> update-model/dict/phones/extra_questions.int corresponds to update-model/dict/phones/extra_questions.txt
--> update-model/dict/phones/extra_questions.{txt, int} are OK

Checking update-model/dict/phones/word_boundary.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 176 entry/entries in update-model/dict/phones/word_boundary.txt
--> update-model/dict/phones/word_boundary.int corresponds to update-model/dict/phones/word_boundary.txt
--> update-model/dict/phones/word_boundary.{txt, int} are OK

Checking optional_silence.txt ...
--> reading update-model/dict/phones/optional_silence.txt
--> update-model/dict/phones/optional_silence.txt is OK

Checking disambiguation symbols: #0 and #1
--> update-model/dict/phones/disambig.txt has "#0" and "#1"
--> update-model/dict/phones/disambig.txt is OK

Checking topo ...

Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...
--> update-model/dict/phones/word_boundary.txt doesn't include disambiguation symbols
--> update-model/dict/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt
--> update-model/dict/phones/word_boundary.txt is OK

Checking word-level disambiguation symbols...
--> update-model/dict/phones/wdisambig.txt exists (newer prepare_lang.sh)
Checking word_boundary.int and disambig.int
sh: 2: export: (x86)/Intel/Intel(R): bad variable name
--> generating a 88 word/subword sequence
sh: 2: export: (x86)/Intel/Intel(R): bad variable name
--> ERROR: number of reconstructed words 0 does not match real number of words 88; indicates problem in L.fst or word_boundary.int.  phoneseq = , wordseq = finches pei reservations rambo mommy courtship dawdling divas vox reorient boomtown whore protectorate hurt rayner topeka adamant mugs fouls birth a._k. stand discontents amazed laurels buttering sidetrack boundary lamport occasional suspicion shortcut melons until threats droppings tourette's greece boo competence fire's throat reimburse buffington waged griffith's meshes twiddling forecasting peters catastrophe tiptoe psychoanalysis statewide polar diluting bandit acronyms alvarez snatching nolte dreary fonder snacked navigate foolish severe barbara influenza shelled manuel adulterous antisocial army palace dollars whiff chalice paws injuries pop legume hyped invalids chide goodridge crappie raving
--> generating a 48 word/subword sequence
sh: 2: export: (x86)/Intel/Intel(R): bad variable name

Checking update-model/dict/oov.{txt, int} ...
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> 1 entry/entries in update-model/dict/oov.txt
--> update-model/dict/oov.int corresponds to update-model/dict/oov.txt
--> update-model/dict/oov.{txt, int} are OK

sh: 2: export: (x86)/Intel/Intel(R): bad variable name
--> ERROR: update-model/dict/L.fst is not olabel sorted
sh: 2: export: (x86)/Intel/Intel(R): bad variable name
--> ERROR: update-model/dict/L_disambig.fst is not olabel sorted
--> ERROR (see error messages above)
prepare_lang.sh: error validating output

我也在Kaldi 帮助小组上问过这个问题。Dan Povey 建议这可能是一个本地问题，其中可能会产生一个引发此错误的子外壳。

我的密码输出如下：-

/home/nitin/kaldi/egs/aspire/s5

我的path.sh如下：

export KALDI_ROOT=`pwd`/../../..
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard 
file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 
1
. $KALDI_ROOT/tools/config/common_path.sh
export PATH=$KALDI_ROOT/tools/sctk/bin:$PATH
export LC_ALL=C
source ../../../tools/env.sh

在运行后续命令之前，需要获取链接博客文章中提到的我的cmd.sh是：

# "queue.pl" uses qsub.  The options to it are
# options to qsub.  If you have GridEngine installed,
# change this to a queue you have access to.
# Otherwise, use "run.pl", which will run jobs locally
# (make sure your --num-jobs options are no more than
# the number of cpus on your machine.

#a) JHU cluster options
export train_cmd="queue.pl"
export decode_cmd="queue.pl --mem 4G"
export mkgraph_cmd="queue.pl --mem 8G"


#b) BUT cluster options
#export train_cmd="queue.pl -q all.q@@blade -l ram_free=1200M,mem_free=1200M"
#export decode_cmd="queue.pl -q all.q@@blade -l ram_free=1700M,mem_free=1700M"
#export decodebig_cmd="queue.pl -q all.q@@blade -l ram_free=4G,mem_free=4G"

#export cuda_cmd="queue.pl -q long.q@@pco203 -l gpu=1"
#export cuda_cmd="queue.pl -q long.q@pcspeech-gpu"
#export mkgraph_cmd="queue.pl -q all.q@@servers -l ram_free=4G,mem_free=4G"

#c) run it locally...
#export train_cmd=run.pl
#export decode_cmd=run.pl
#export cuda_cmd=run.pl
#export mkgraph_cmd=run.pl

这里有任何 Linux 队长可以帮助我吗？

score 2 · Accepted Answer

错误信息

sh: 2: export: (x86)/Intel/Intel(R): bad variable name

表示由于缺少引用而导致的分词问题。

文本(x86)/Intel/Intel(R)看起来像包含空格的目录路径的一部分，因为它在 Windows 上很常见。可能是这样的

C:/Program Files (x86)/Intel/Intel(R) something

您可能可以在PATH变量的值中找到它。

根据 KALDI 帮助组中引用的线程，问题可能出在您的path.sh文件中。

使用您当前的工作目录/home/nitin/kaldi/egs/aspire/s5，问题不会出现在该行中

export KALDI_ROOT=`pwd`/../../..

但为了避免可能出现的问题，它应该是

export KALDI_ROOT="$(pwd)"/../../..

或者

export KALDI_ROOT="$(pwd)/../../.."

问题似乎出现在脚本的第 2 行（与错误消息匹配）：

export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH

我猜您PATH 包含带空格的目录（包括错误消息中显示的部分）。在这种情况下，shell 会在每个空格上分割线，你会得到类似的东西

export PATH=something maybe_something_else (x86)/Intel/Intel(R) maybe_again_something

这将（尝试）导出变量PATH,maybe_something_else和(x86)/Intel/Intel(R)...maybe_again_something这不是您想要的。您希望所有这些都在PATH.

您很幸运能从 shell 收到关于无效变量名的错误消息(x86)/Intel/Intel(R)。如果所有部分都是有效的变量名，您将得到错误的 PATH 和一些不需要的环境变量，但没有错误消息。

所以你也应该引用这一行，一般来说，所有可能包含空格的变量的扩展。

我建议path.sh改为

export KALDI_ROOT="$(pwd)/../../.."
export PATH="$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH"
[ ! -f "$KALDI_ROOT/tools/config/common_path.sh" ] && echo >&2 "The standard 
file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 
1
. "$KALDI_ROOT/tools/config/common_path.sh"
export "PATH=$KALDI_ROOT/tools/sctk/bin:$PATH"
export LC_ALL=C
source ../../../tools/env.sh

我不知道卡迪。这个文件是生成的还是你手动创建的？

线

export KALDI_ROOT=`pwd`/../../..

可能会出现问题，因为它取决于您运行脚本时的当前工作目录。我不知道是否有一种机制可以确保您仅从该脚本所在的目录运行它。否则这将导致错误的值KALDI_ROOT

我不知道是否有理由这样做，但使用绝对路径而不是取决于您的工作目录的路径可能是有意义的。

当前目录/home/nitin/kaldi/egs/aspire/s5将导致

export KALDI_ROOT=/home/nitin/kaldi/egs/aspire/s5/../../..

我会将脚本中的行替换为

export KALDI_ROOT=/home/nitin/kaldi

您可能会在 KALDI 帮助小组中询问此建议。

编辑：

如果添加引号path.sh还不够，请检查$KALDI_ROOT/tools/config/common_path.sh和$KALDI_ROOT/tools/env.sh其他可能存在的脚本。

作为起点，您可以搜索同时包含export和的行的文件$PATH。（当然这也可能与其他变量一起发生。）示例：

find /home/nitin/kaldi -type f -exec grep 'export.*\$PATH' {} /dev/null \;

我只是注意到这path.sh有点不一致。它使用pwd和变量的命令替换$PWD以及$KALDI_HOME和硬编码../../..，就好像这些行是由不同的人编写的一样。

linux - 扩展 Kaldi Aspire：使用新词典和语法文件重新编译 HCLG.fst 时出现变量错误

1 回答 1

Related

Reference