0

我正在尝试分别运行不同的fit_one_cycle功能时期;保存模型,加载它并开始一个新的时代:

learn = language_model_learner(data, AWD_LSTM, drop_mult=0.5, pretrained=False).to_fp16()
learn.load('/content/gdrive/My Drive/Language Model/language_model')
learn.load_encoder('/content/gdrive/My Drive/Language Model/model_encoder');
lr = 1e-3
lr *= bs/48  # Scale learning rate by batch size
learn.unfreeze()
learn.fit_one_cycle(1, lr, moms=(0.8,0.7))
learn.save('/content/gdrive/My Drive/Language Model/language_model')
learn.save_encoder('/content/gdrive/My Drive/Language Model/model_encoder')

问题:我应该如何改变learning rate每个时代之后?

4

1 回答 1

1

您可以检查判别层训练,它对模型中的不同层使用不同的学习率。

  1. 使用创建模型的图层组
# creates 3 layer groups with start, middle and end groups
learn.split(lambda m: (m[0][6], m[1]))

# only randomly initialized head now trainable
learn.freeze()

注意:无需手动拆分图层fit_one_cycle自动随机拆分。

  1. 为每个图层组手动设置 LR 速率和权重衰减
# all layers now trainable
learn.unfreeze()

# optionally, separate LR and WD for each group for 5 epochs
learn.fit_one_cycle(5, max_lr=slice(1e-5,1e-3), wd=(1e-4,1e-4,1e-1))
于 2019-10-31T09:21:11.310 回答