0

in train_epoch function

we have three kinds of losses

  1. loss
  2. batch_loss
  3. train_loss

as I understand loss is a tensor, batch loss is the value of the tensor , train_loss is the accumulative value of the batch_loss this is ok for me.

my question is why AllenNLP considered the batch_loss in for batch and did not calculate the cumulative loss for batch_group?

Also I did not understand the need for batch_group inside epoch, and batch inside batch_group

this is my understanding we have epoch inside it we have batch_group inside batch_group we have batch the batch_loss is calculated for batch not for batch_group why?

4

1 回答 1

1

我的问题是为什么 AllenNLP 考虑了 batch_loss 并没有计算 batch_group 的累积损失?

这实际上是一个错误,所以感谢您指出这一点!现在有一个 PR 可以修复它:https ://github.com/allenai/allennlp/pull/4706

我也不明白在 epoch 内需要 batch_group,在 batch_group 内需要批处理

batch_groupbatch除非您使用num_gradient_accumulation_steps大于 1的值,否则始终只包含一个,即使用梯度累积,这是一种获得更大有效批量大小的方法。

例如,请参阅https://medium.com/ai2-blog/tutorial-training-on-larger-batches-with-less-memory-in-allennlp-1cd2047d92ad 。

于 2020-10-05T18:35:13.640 回答