python-3.x - 为 RobertaForSequenceClassification 加载 state_dict 时出错

Question

我正在使用经过微调的 Roberta 模型，该模型是在 Jigsaw Data 上训练的无偏毒罗伯塔：

https://huggingface.co/unitary/unbiased-toxic-roberta

它在 16 个类上进行了微调。

我正在编写用于二进制分类的代码：

将二进制标签上的损失计算为准确性的指标

def compute_metrics(eval_pred):
    
    logits, labels = eval_pred
   

    predictions = np.argmax(logits, axis=-1)
    
    acc = np.sum(predictions == labels) / predictions.shape[0]
    
    return {"accuracy" : acc}

import torch.nn as nn
model = tr.RobertaForSequenceClassification.from_pretrained("/home/pc/unbiased_toxic_roberta",num_labels=2)

model.to(device)



training_args = tr.TrainingArguments(
#     report_to = 'wandb',
    output_dir='/home/pc/1_Proj_hate_speech/results_roberta',          # output directory
    overwrite_output_dir = True,
    num_train_epochs=20,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=32,   # batch size for evaluation
    learning_rate=2e-5,
    warmup_steps=1000,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs3',            # directory for storing logs
    logging_steps=1000,
    evaluation_strategy="epoch"
    ,save_strategy="epoch"
    ,load_best_model_at_end=True
)


trainer = tr.Trainer(
    model=model,                         # the instantiated  Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_data,         # training dataset
    eval_dataset=val_data,             # evaluation dataset
    compute_metrics=compute_metrics

)

当我运行它时，我得到一个错误：

loading weights file /home/pc/unbiased_toxic_roberta/pytorch_model.bin
RuntimeError: Error(s) in loading state_dict for RobertaForSequenceClassification:
    size mismatch for classifier.out_proj.weight: copying a param with shape torch.Size([16, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
    size mismatch for classifier.out_proj.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([2]).

如何添加线性层并解决此错误？

score 0 · Accepted Answer

加载ignore_mismatched_sizes=True：

model = tr.RobertaForSequenceClassification.from_pretrained(
    "/home/pc/unbiased_toxic_roberta",
    num_labels=2,
    ignore_mismatched_sizes=True)

然后你可以微调模型。

python-3.x - 为 RobertaForSequenceClassification 加载 state_dict 时出错

1 回答 1

Related

Reference