deep-learning - 如何在训练和推理过程中有效地使用 Deespeech (v0.5.1) 和语言模型的使用？

Question

我正在尝试使用 Deepspeech v0.5.1 for English 训练和使用模型。我的目标是训练两个模型，一个有语言模型，一个没有语言模型。请在几个方面请求您的帮助。对不起，这很长，但尽量详细；而且，作为 Linux 和数据科学的新手，我可能会说一些非常明显的事情。预先感谢您的帮助。由于 SO 说原始表单是垃圾邮件，因此我发布并回答此问题并提供更多背景信息。问候，罗希特

B）我的问题：

B1) 当使用语言模型进行训练或推理时，我是否必须指定 lm_binary 参数和相应的 trie 文件？可以只使用 trie 工作吗？

B2）不管在训练模型时是否使用了语言模型（binaryFile和trie一起），以后当模型用于推理时，我可以选择使用或不使用语言模型吗？以后可以使用不同的语言模型还是只使用用于训练的语言模型？选择替代模型时有什么需要注意的吗？例如，使用 3-gram 模型进行训练，但在推理期间使用 4-gram 模型？你还能想到这样的事情吗？

B3) 假设我的模型已经通过对词汇文件、arpa、trie 和 lm_binary 的训练构建而成，这些文件仅由 10k 个数据点构建而成。假设我从比用于训练的语料库更大的语料库创建了一个名为 BigVocabulary.file 的新词汇表。例如，validated.tsv 文件中的全部 629731 个数据点；使用更大的词汇来创建 .arpa、lmBinary 和 trie 文件。我通过比较字母文件来确保有效字符完全相同。那么在使用较小词汇量训练的模型上，我可以在使用命令进行推理时使用 BigVocabulary.binary.file 和 BigVocabulary.trie 吗？

我已经创建了一个只有前 1000 个文件的模型，推理很差但有效。命令：

deepspeech \ --model /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb \ --alphabet /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/alphabetDir/字母-Set5First1050.txt \ --lm /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm \ --trie /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5- 1kTotal/trie/trie4gram/Set5First1050_4gram.trie \ --audio /home/rohit/dpspTraining/data/wavFiles/wav33/test/File28。

控制台输出：

将模型文件转换为映射图以减少堆使用。2019-08-01 16:11:02.155443: I tensorflow/core/platform/cpu_feature_guard.cc:141] 您的 CPU 支持未编译此 TensorFlow 二进制文件以使用的指令：AVX2 FMA 2019-08-01 16:11:02.179690 : E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') 用于未知操作：UnwrapDatasetVariant 2019-08-01 16:11:02.179740: E tensorflow/core/ framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') 未知操作：WrapDatasetVariant 2019-08-01 16:11:02.179756: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') 用于未知操作：WrapDatasetVariant 2019-08-01 16:11:02.179891: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg : "input_handle" host_memory_arg: "output_handle"') 用于未知操作：UnwrapDatasetVariant 在 0.0283 秒内加载模型。从文件 /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/trie/trie4gram/Set5First1050_4gram 加载语言模型。 trie 在 0.068 秒内加载语言模型。运行推理。a 中的 a 是 3.041 秒音频文件的推理耗时 0.449 秒。E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant 在 0.0283 秒内加载模型。从文件 /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/trie/trie4gram/Set5First1050_4gram 加载语言模型。 trie 在 0.068 秒内加载语言模型。运行推理。a 中的 a 是 3.041 秒音频文件的推理耗时 0.449 秒。E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant 在 0.0283 秒内加载模型。从文件 /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/trie/trie4gram/Set5First1050_4gram 加载语言模型。 trie 在 0.068 秒内加载语言模型。运行推理。a 中的 a 是 3.041 秒音频文件的推理耗时 0.449 秒。) 对于未知操作：UnwrapDatasetVariant 在 0.0283 秒内加载模型。从文件 /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/trie/trie4gram/Set5First1050_4gram 加载语言模型。 trie 在 0.068 秒内加载语言模型。运行推理。a 中的 a 是 3.041 秒音频文件的推理耗时 0.449 秒。) 对于未知操作：UnwrapDatasetVariant 在 0.0283 秒内加载模型。从文件 /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/trie/trie4gram/Set5First1050_4gram 加载语言模型。 trie 在 0.068 秒内加载语言模型。运行推理。a 中的 a 是 3.041 秒音频文件的推理耗时 0.449 秒。

但是，如果我使用 BigVocabulary.trie 和 lmBinary 文件，则会收到一条错误消息，指出 trie 文件版本不匹配并更新 trie 文件。

但它似乎仍然加载了语言模型。那么 Deepspeech 真的把它捡起来并正确应用了吗？如何修复此错误？

命令：

deepspeech \ --model /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb \ --alphabet /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/alphabetDir/字母-Set5First1050.txt \ --lm /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/lm/lm4gram/vocabulary-allValidated_o4gram.klm \ --trie /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/trie/ trie4gram/allValidated_o4gram.trie \ --audio /home/rohit/dpspTraining/data/wavFiles/wav33/test/File28.wav

控制台输出：

(dpsp5v051basic) rohit@DE-W-0246802:~/dpspCODE/v051/DeepSpeech$ deepspeech \

--model /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb \ --alphabet /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/alphabetDir/alphabet- Set5First1050.txt \ --lm /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/lm/lm4gram/vocabulary-allValidated_o4gram.klm \ --trie /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/trie/trie4gram/ allValidated_o4gram.trie \ --audio /home/rohit/dpspTraining/data/wavFiles/wav33/test/File28.wav 从文件/home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/加载模型output_graph.pb TensorFlow：v1.13.1-10-g3e0cc53 DeepSpeech：v0.5.1-0-g4b29b78 警告：将整个模型文件读入内存。将模型文件转换为映射图以减少堆使用。2019-08-01 16:11:58.305524: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant 在 0.0199 秒内加载模型。从文件/home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/lm/lm4gram/vocabulary-allValidated_o4gram.klm /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/trie/trie4gram/allValidated_o4gram.trie 加载语言模型错误：Trie文件版本不匹配（4 而不是预期的 3）。更新您的 trie 文件。在 0.00368 秒内加载语言模型。运行推理。一个 on o tn o as te tee Inference 为 3.041 秒的音频文件花费了 1.893 秒。从文件/home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/lm/lm4gram/vocabulary-allValidated_o4gram.klm /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/trie/trie4gram/allValidated_o4gram.trie 加载语言模型错误：Trie文件版本不匹配（4 而不是预期的 3）。更新您的 trie 文件。在 0.00368 秒内加载语言模型。运行推理。一个 on o tn o as te tee Inference 为 3.041 秒的音频文件花费了 1.893 秒。从文件/home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/lm/lm4gram/vocabulary-allValidated_o4gram.klm /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/trie/trie4gram/allValidated_o4gram.trie 加载语言模型错误：Trie文件版本不匹配（4 而不是预期的 3）。更新您的 trie 文件。在 0.00368 秒内加载语言模型。运行推理。一个 on o tn o as te tee Inference 为 3.041 秒的音频文件花费了 1.893 秒。

感谢您的时间。

score 1 · Accepted Answer

一）背景：

A1) 使用 Ubuntu 18.04LTS，无 GPU，32GB 内存。

2019 年 6 月中旬左右下载了 Mozilla Common Voice Corpus（英语）。
获取验证的.tsv 文件，进行一些基本的成绩单验证并将数据集修剪为 629731 个条目。接下来选择前 10k 个条目并使用 70:20:10 的比率作为 train:dev:test 创建的 csv 文件进行拆分。
MP3 转换为 wav 文件（16kHz、单声道、16 位），长度小于 10 秒。
使用 Deepspeech v0.5.1 设置 Anaconda 环境。
克隆 github v0.5.1 代码。
在 Deepspeech 文件夹中发出命令，这似乎是创建 generate_trie 可执行文件和其他所需设置所必需的：

python util/taskcluster.py --target 。
从从命令获得的链接安装了 CTC-decoder：

python util/taskcluster.py --decoder
接下来创建的词汇文件只有成绩单。
任何标志和其他默认参数都没有变化。

A2) 语言模型相关：

使用 KenLM。从 git repo 下载并编译。
创建 4-gram 版本的命令：
到 arpa 的词汇文件：

./lmplz -o 4 --text /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/vocabDir/vocabulary-Set3First10k.txt --arpa /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/vocabDir /vocabulary-Set3First10k_4gram.arpa
arpa 到 lm_binary 文件：

./build_binary /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/vocabDir/vocabulary-Set3First10k_4gram.arpa /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/lm/lm4gram/vocabulary-Set3First10k_4gram.klm
使用 generate_trie 制作 trie 文件

/home/rohit/dpspCODE/v051/DeepSpeech/generate_trie /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/alphabetDir/alphabet-Set3First10k.txt /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/lm /lm4gram/vocabulary-Set3First10k_4gram.klm /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/trie/trie4gram/set3First10k_4gram.trie
请注意，trie 文件已成功创建。

A3) 开始模型训练的命令（训练仍在进行中）：

A3a) 没有语言模型的模型：

python3 -u DeepSpeech.py \ --train_files /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles/train.csv \ --dev_files /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles /dev.csv \ --test_files /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles/test.csv \ --train_batch_size 1 \ --dev_batch_size 1 \ --test_batch_size 1 \ --n_hidden 2048 \ - -epoch 20 \ --dropout_rate 0.15 \ --learning_rate 0.0001 \ --export_dir /home/rohit/dpspTraining/models/v051/model5-validFirst10k-noLM/savedModel \ --checkpoint_dir /home/rohit/dpspTraining/models/v051/ model5-validFirst10k-noLM/checkpointDir \ --alphabet_config_path /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/alphabetDir/alphabet-Set3First10k.txt \ "$@"

A3b) 带有语言模型的模型：

python3 -u DeepSpeech.py \ --train_files /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles/train.csv \ --dev_files /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles /dev.csv \ --test_files /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/csvFiles/test.csv \ --train_batch_size 1 \ --dev_batch_size 1 \ --test_batch_size 1 \ --n_hidden 2048 \ - -epoch 20 \ --dropout_rate 0.15 \ --learning_rate 0.0001 \ --export_dir /home/rohit/dpspTraining/models/v051/model6-validFirst10k-yesLM-4gram/savedModel \ --checkpoint_dir /home/rohit/dpspTraining/models/ v051/model6-validFirst10k-yesLM-4gram/checkpointDir \ --decoder_library_path /home/rohit/dpspCODE/v051/DeepSpeech/native_client/libctc_decoder_with_kenlm。所以 \ --alphabet_config_path /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/alphabetDir/alphabet-Set3First10k.txt \ --lm_binary_path /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/lm/lm4gram/词汇-Set3First10k_4gram.klm \ --lm_trie_path /home/rohit/dpspTraining/data/wavFiles/commVoiceSet3-10kTotal/trie/trie4gram/set3First10k_4gram.trie \ "$@"

deep-learning - 如何在训练和推理过程中有效地使用 Deespeech (v0.5.1) 和语言模型的使用？

1 回答 1

Related

Reference