transformers
我正在尝试通过在库中使用 Trainer API 来微调/预训练现有的 BERT 模型以进行情绪分析。我的训练数据集如下所示:
Text Sentiment
This was good place 1
This was bad place 0
我的目标是能够将情绪分类为正面/负面。这是我的代码:
from datasets import load_dataset
from datasets import load_dataset_builder
import datasets
import transformers
from transformers import TrainingArguments
from transformers import Trainer
dataset = load_dataset('csv', data_files='my_data.csv', sep=';')
tokenizer = transformers.BertTokenizer.from_pretrained("bert-base-cased")
model = transformers.BertForMaskedLM.from_pretrained("bert-base-cased")
print(dataset)
def tokenize_function(examples):
return tokenizer(examples["Text"], examples["Sentiment"], truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
training_args = TrainingArguments("test_trainer")
trainer = Trainer(
model=model, args=training_args, train_dataset=tokenized_datasets
)
trainer.train()
这会引发错误消息:
ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).
我做错了什么?任何建议都受到高度赞赏。