0

Mozilla DeepSpeech 中的“时代测试 [数字]”是什么意思?

在下面的例子中,它说Test of Epoch 77263,即使根据我的理解应该只有 1 个 epoch,因为我给出了--display_step 1 --limit_train 1 --limit_dev 1 --limit_test 1 --early_stop False --epoch 1参数:

dernoncourt@ilcomp:~/asr/DeepSpeech$ ./DeepSpeech.py --train_files data/common-voice-v1/cv-valid-train.csv,data/common-voice-v1/cv-other-train.csv --dev_files data/common-voice-v1/cv-valid-dev.csv --test_files data/common-voice-v1/cv-valid-test.csv --decoder_library_path /asr/DeepSpeech/libctc_decoder_with_kenlm.so --fulltrace True --display_step 1  --limit_train 1  --limit_dev 1  --limit_test 1 --early_stop False --epoch 1
W Parameter --validation_step needs to be >0 for early stopping to work
I Test of Epoch 77263 - WER: 1.000000, loss: 60.50202560424805, mean edit distance: 0.894737
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 58.900837, mean edit distance: 0.894737
I  - src: "how do you like her"
I  - res: "i "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 60.517113, mean edit distance: 0.894737
I  - src: "how do you like her"
I  - res: "i "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 60.668221, mean edit distance: 0.894737
I  - src: "how do you like her"
I  - res: "i "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 61.921925, mean edit distance: 0.894737
I  - src: "how do you like her"
I  - res: "i "
I --------------------------------------------------------------------------------
4

1 回答 1

1

蒂尔曼坎普的解释

这实际上不是一个错误,因为当前时期是根据您当前的参数和快照持久化的全局步数计算的。仔细看看这段摘录:

# Number of GPUs per worker - fixed for now by local reality or cluster setup
gpus_per_worker = len(available_devices)

# Number of batches processed per job per worker
batches_per_job  = gpus_per_worker * max(1, FLAGS.iters_per_worker)

# Number of batches per global step
batches_per_step = gpus_per_worker * max(1, FLAGS.replicas_to_agg)

# Number of global steps per epoch - to be at least 1
steps_per_epoch = max(1, model_feeder.train.total_batches // batches_per_step)

# The start epoch of our training
# Number of GPUs per worker - fixed for now by local reality or cluster setup
gpus_per_worker = len(available_devices)

# Number of batches processed per job per worker
batches_per_job  = gpus_per_worker * max(1, FLAGS.iters_per_worker)

# Number of batches per global step
batches_per_step = gpus_per_worker * max(1, FLAGS.replicas_to_agg)

# Number of global steps per epoch - to be at least 1
steps_per_epoch = max(1, model_feeder.train.total_batches // batches_per_step)

# The start epoch of our training
self._epoch = step // steps_per_epoch

所以发生的情况是您在训练期间的集合大小与您当前的集合大小不同。因此奇怪的纪元数。

简化示例(不混淆批量大小):如果您曾经训练了 1000 个样本训练集的 5 个 epoch,那么您将获得 5000 个“全局步骤”(在您的快照中保留为一个数字)。在此培训之后,您将命令行参数更改为一组大小 1(您的 --limit_* 参数)。“突然”你会看到 epoch 5000,因为 5000 全局步骤意味着应用大小为 1 5000 次的数据集。

带走:使用--checkpoint_dir论据来避免此类问题。

于 2018-07-02T19:15:00.280 回答