huggingface-transformers - 使用/指定 attention_mask 使用 Trainer 和 TrainingArguments 训练 GPT2

Question

我正在使用 Trainer & TrainingArguments 来训练 GPT2 模型，但这似乎效果不佳。

我的数据集包含我的语料库标记的 ID 和每个文本的掩码，以指示在何处应用注意力：

Dataset({
features: ['attention_mask', 'input_ids', 'labels'],
num_rows: 2012860
}))

我正在使用 Trainer & TrainingArguments 进行培训，传递我的模型和我以前的数据集，如下所示。但是我没有在任何地方指定关于 attention_mask 的任何内容：

training_args = TrainingArguments(
output_dir=path_save_checkpoints,
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size = 4,
gradient_accumulation_steps = 4,
logging_steps = 5_000, save_steps=5_000,
fp16=True,
deepspeed="ds_config.json",
remove_unused_columns = True,
debug = True
)

trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset,
tokenizer=tokenizer,
)

trainer.train()

我应该如何告诉 Trainer 使用此功能（attention_mask）？如果您查看文件 /transformers/trainer.py ，则没有提及“注意”或“掩码”。

提前致谢！

score 0 · Accepted Answer

Somewhere in the source code, you will see that inputs are passed to the model something like this

outputs = model(**inputs)

As long as your collator returns a dictionary that includes the attention_mask key, your attention mask will be passed to your GPT2 model.

huggingface-transformers - 使用/指定 attention_mask 使用 Trainer 和 TrainingArguments 训练 GPT2

1 回答 1

Related

Reference