1

这是问题:我在模型评估期间从检查点文件加载我的 pytorch 模型(训练时开发集上的最佳结果),记得做model.eval()torch.no_grad(),我仍然得到较低的准确性与我在训练时得到的结果相比,开发集上的结果(下降了 1-2%)。

我努力了:

  • 在 pytorch 之前打印状态字典在训练期间保存最佳结果模型,与加载时得到的模型相比,这是相同的。
  • 检查我的代码,它使用了大量的 dropout 和 layernorm 层,并且没有错误。
  • 在同一个 GPU 上加载模型,但没有任何帮助。

我的工作环境:

  • Python 3.6.10、Pytorch 1.7.1(带有 cuda 11.1)
  • 显卡:英伟达 2080Ti
  • 在训练和评估期间使用相同的种子(numpy 和 pytorch)
  • 在模型训练和评估期间,在开发集上使用model.eval()torch.no_grad() 。
  • 相同的开发集和相同的度量计算方法。

这是我在训练期间的伪代码(原来的太重了):

# load my data.
train_dataset = FinetuningDataset(vocab=vocab, domains=domains, data_files=data_files, max_len=data_config["max_len"], giga_embedding_vocab=giga_embedding.word2id)

val_dataset = FinetuningDataset(vocab, domains=domains, data_files=dev_data_path, max_len=data_config['max_len'], giga_embedding_vocab=giga_embedding.word2id)

sp_collator = SortPadCollator(sort_key=lambda x:x[0], ignore_indics=[0])   
train_iter = DataLoader(dataset=train_dataset,  
                        batch_size=data_config["batch_size"], 
                        shuffle=data_config["shuffle"],
                        collate_fn=sp_collator)
val_iter = DataLoader(dataset=val_dataset,  
                    batch_size=data_config["batch_size"], 
                    shuffle=data_config["shuffle"], 
                    collate_fn=sp_collator)
adatrans = AdaTrans(vocab=vocab, config=model_config, domain_size=len(domains))
adatrans.load_state_dict(torch.load('ckpt_adatrans/litebert_1e-3_50cls_cuda2.pt'))
model = MixLM(adatrans=adatrans, vocab=vocab, config=model_config, giga_embedding=giga_embedding)

# this is my loss function during training.
loss_fn_dct = {"mask_loss": neg_log_likelihood_loss, "emb_mse_loss":nn.MSELoss(reduction='none'), "domain_cls_loss":nn.NLLLoss(reduction='none')}
metrics_fn_dct = {"mask_metrics":accuracy}

# build a trainer.
trainer = ftTrainer(loss_fn_dct=loss_fn_dct, metrics_fn_dct=metrics_fn_dct, config=trainer_config)
# gets best result on dev set and save it to checkpoint.pt
best_res, best_state_dict = trainer.train(model=model, train_iter=train_iter, val_iter=val_iter, optimizer=trainer_config['optimizer'], device=trainer_config['device'])
print("best result:: ", best_res)
trainer.save(best_state_dict, trainer_config['model_path'])

trainer.py中,我保存最佳状态字典结果并返回:

model.eval()
for dev_batch in val_iter:
    with torch.no_grad():
      # this self.val() runs model forward function and return prediction result.
      dev_res = self.val(dev_batch, model, device)
      dev_loss += dev_res['loss'].item()
# this function gets result metric.(which drops during evaluation.)
dev_metric = model.domain_biaffine._attachment_scores.get_metric(reset=True)
if dev_metric['UAS'] > best_UAS:
    best_UAS = dev_metric['UAS']
    best_res, best_state_dict = dev_metric, model.state_dict()

print("dev_loss: ", dev_loss / cnt_iter)
print("dev metric: ", dev_metric)

evaluation.py中,我只加载 checkpoint.pt 并进行预测:

test_dataset = FinetuningDataset(vocab=vocab, domains=domains, data_files=data_files, max_len=data_config["max_len"], giga_embedding_vocab=giga_embedding.word2id)

sp_collator = SortPadCollator(sort_key=lambda x:x[0], ignore_indics=[0])   

test_iter = DataLoader(dataset=test_dataset,  
                        batch_size=data_config["batch_size"], 
                        shuffle=False,
                        collate_fn=sp_collator)

adatrans = AdaTrans(vocab=vocab, config=model_config, domain_size=len(domains))
model = MixLM(adatrans=adatrans, vocab=vocab, config=model_config, giga_embedding=giga_embedding)

# load pytorch checkpoint.pt
model.load_state_dict(torch.load(data_config['model_path'], torch.device('cuda:1')), strict=True)

trainer = ftTrainer(config=trainer_config, vocab=vocab, id2word=giga_embedding.id2word)
# this line makes prediction, which do model.forward and print metric(which is the same as the trainer.py snippet.)
trainer.inference(model=model, test_iter=test_iter, device=trainer_config['device'])

我在谷歌上搜索了很长时间,但没有任何帮助。这完全困扰着我。有人可以帮我吗?提前致谢!

4

0 回答 0