我正在尝试使用 Huggingface Trainer API 微调 BERT 模型以进行情绪分析(将文本分类为正面/负面)。我的数据集有两列Text
,Sentiment
它看起来像这样。
Text Sentiment
This was good place 1
This was bad place 0
这是我的代码:
from datasets import load_dataset
from datasets import load_dataset_builder
from datasets import Dataset
import datasets
import transformers
from transformers import TrainingArguments
from transformers import Trainer
dataset = load_dataset('csv', data_files='./train/test.csv', sep=';')
tokenizer = transformers.BertTokenizer.from_pretrained("TurkuNLP/bert-base-finnish-cased-v1")
model = transformers.BertForSequenceClassification.from_pretrained("TurkuNLP/bert-base-finnish-cased-v1", num_labels=1)
def tokenize_function(examples):
return tokenizer(examples["Text"], truncation=True, padding='max_length')
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.rename_column('Sentiment', 'label')
tokenized_datasets = tokenized_datasets.remove_columns('Text')
training_args = TrainingArguments("test_trainer")
trainer = Trainer(
model=model, args=training_args, train_dataset=tokenized_datasets['train']
)
trainer.train()
运行此引发错误:
Variable._execution_engine.run_backward(
RuntimeError: Found dtype Long but expected Float
错误可能来自数据集本身,但我可以用我的代码以某种方式修复它吗?我搜索了互联网,这个错误似乎之前已经通过“将张量转换为浮点数”解决了,但是我将如何使用 Trainer API 来解决这个问题?任何建议都非常感谢。
一些参考:
https://discuss.pytorch.org/t/run-backward-expected-dtype-float-but-got-dtype-long/61650/10