1

现在我正在尝试训练/微调一个带有多选头部的预训练 RoBERTa 模型,但我很难找到正确的输入,因此我的模型能够训练/微调。

我现在拥有的数据框如下所示: 在此处输入图像描述

使用 3 个选项被标记化的句子,使用:

tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
for i in range(0, len(train_data)):
  train_data["OptionA"][i] = tokenizer.encode(train_data["OptionA"][i])
  train_data["OptionB"][i] = tokenizer.encode(train_data["OptionB"][i])
  train_data["OptionC"][i] = tokenizer.encode(train_data["OptionC"][i])

我的评估集也是这样,测试集有 6500 行,评估集有 1500 行。我正在尝试通过以下方式实现这一点:

from transformers import RobertaForMultipleChoice, Trainer, TrainingArguments
model = RobertaForMultipleChoice.from_pretrained('roberta-base')

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=1,              # total # of training epochs
    per_device_train_batch_size=32,  # batch size per device during training
    per_device_eval_batch_size=32,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
)

trainer = Trainer(
    model=model,                     # the instantiated  Transformers model to be trained
    args=training_args,              # training arguments, defined above
    train_dataset = train_split,     # training dataset
    eval_dataset = eval_split        # evaluation dataset
)

trainer.train()

但我不断收到不同的键错误,例如:

密钥错误:2526

如果有人知道我做错了什么,我将非常感激,因为我在过去 3 天里一直在努力训练这个模型。

4

0 回答 0