这是问题:我在模型评估期间从检查点文件加载我的 pytorch 模型(训练时开发集上的最佳结果),记得做model.eval()和torch.no_grad(),我仍然得到较低的准确性与我在训练时得到的结果相比,开发集上的结果(下降了 1-2%)。
我努力了:
- 在 pytorch 之前打印状态字典在训练期间保存最佳结果模型,与加载时得到的模型相比,这是相同的。
- 检查我的代码,它使用了大量的 dropout 和 layernorm 层,并且没有错误。
- 在同一个 GPU 上加载模型,但没有任何帮助。
我的工作环境:
- Python 3.6.10、Pytorch 1.7.1(带有 cuda 11.1)
- 显卡:英伟达 2080Ti
- 在训练和评估期间使用相同的种子(numpy 和 pytorch)
- 在模型训练和评估期间,在开发集上使用model.eval()和torch.no_grad() 。
- 相同的开发集和相同的度量计算方法。
这是我在训练期间的伪代码(原来的太重了):
# load my data.
train_dataset = FinetuningDataset(vocab=vocab, domains=domains, data_files=data_files, max_len=data_config["max_len"], giga_embedding_vocab=giga_embedding.word2id)
val_dataset = FinetuningDataset(vocab, domains=domains, data_files=dev_data_path, max_len=data_config['max_len'], giga_embedding_vocab=giga_embedding.word2id)
sp_collator = SortPadCollator(sort_key=lambda x:x[0], ignore_indics=[0])
train_iter = DataLoader(dataset=train_dataset,
batch_size=data_config["batch_size"],
shuffle=data_config["shuffle"],
collate_fn=sp_collator)
val_iter = DataLoader(dataset=val_dataset,
batch_size=data_config["batch_size"],
shuffle=data_config["shuffle"],
collate_fn=sp_collator)
adatrans = AdaTrans(vocab=vocab, config=model_config, domain_size=len(domains))
adatrans.load_state_dict(torch.load('ckpt_adatrans/litebert_1e-3_50cls_cuda2.pt'))
model = MixLM(adatrans=adatrans, vocab=vocab, config=model_config, giga_embedding=giga_embedding)
# this is my loss function during training.
loss_fn_dct = {"mask_loss": neg_log_likelihood_loss, "emb_mse_loss":nn.MSELoss(reduction='none'), "domain_cls_loss":nn.NLLLoss(reduction='none')}
metrics_fn_dct = {"mask_metrics":accuracy}
# build a trainer.
trainer = ftTrainer(loss_fn_dct=loss_fn_dct, metrics_fn_dct=metrics_fn_dct, config=trainer_config)
# gets best result on dev set and save it to checkpoint.pt
best_res, best_state_dict = trainer.train(model=model, train_iter=train_iter, val_iter=val_iter, optimizer=trainer_config['optimizer'], device=trainer_config['device'])
print("best result:: ", best_res)
trainer.save(best_state_dict, trainer_config['model_path'])
在trainer.py中,我保存最佳状态字典结果并返回:
model.eval()
for dev_batch in val_iter:
with torch.no_grad():
# this self.val() runs model forward function and return prediction result.
dev_res = self.val(dev_batch, model, device)
dev_loss += dev_res['loss'].item()
# this function gets result metric.(which drops during evaluation.)
dev_metric = model.domain_biaffine._attachment_scores.get_metric(reset=True)
if dev_metric['UAS'] > best_UAS:
best_UAS = dev_metric['UAS']
best_res, best_state_dict = dev_metric, model.state_dict()
print("dev_loss: ", dev_loss / cnt_iter)
print("dev metric: ", dev_metric)
在evaluation.py中,我只加载 checkpoint.pt 并进行预测:
test_dataset = FinetuningDataset(vocab=vocab, domains=domains, data_files=data_files, max_len=data_config["max_len"], giga_embedding_vocab=giga_embedding.word2id)
sp_collator = SortPadCollator(sort_key=lambda x:x[0], ignore_indics=[0])
test_iter = DataLoader(dataset=test_dataset,
batch_size=data_config["batch_size"],
shuffle=False,
collate_fn=sp_collator)
adatrans = AdaTrans(vocab=vocab, config=model_config, domain_size=len(domains))
model = MixLM(adatrans=adatrans, vocab=vocab, config=model_config, giga_embedding=giga_embedding)
# load pytorch checkpoint.pt
model.load_state_dict(torch.load(data_config['model_path'], torch.device('cuda:1')), strict=True)
trainer = ftTrainer(config=trainer_config, vocab=vocab, id2word=giga_embedding.id2word)
# this line makes prediction, which do model.forward and print metric(which is the same as the trainer.py snippet.)
trainer.inference(model=model, test_iter=test_iter, device=trainer_config['device'])
我在谷歌上搜索了很长时间,但没有任何帮助。这完全困扰着我。有人可以帮我吗?提前致谢!