我正在尝试从 simpletransformers 训练 QuestionAnsweringModel bert-base-multilingual-uncased 并面临下一个问题:
AttributeError Traceback (most recent call last)
<ipython-input-10-40e9356ccee6> in <module>()
----> 1 model.train(traindata, output_dir='/content/drive/MyDrive')
1 frames
/usr/local/lib/python3.7/dist-packages/simpletransformers/question_answering/question_answering_model.py in train(self, train_dataset, output_dir, show_running_loss, eval_data, verbose, **kwargs)
578 steps_trained_in_current_epoch -= 1
579 continue
--> 580 batch = tuple(t.to(device) for t in batch)
581
582 inputs = self._get_inputs_dict(batch)
/usr/local/lib/python3.7/dist-packages/simpletransformers/question_answering/question_answering_model.py in <genexpr>(.0)
578 steps_trained_in_current_epoch -= 1
579 continue
--> 580 batch = tuple(t.to(device) for t in batch)
581
582 inputs = self._get_inputs_dict(batch)
AttributeError: 'str' object has no attribute 'to'
我的数据准备:
!wget https://onti2020.ai-academy.ru/task/rucos_test.jsonl
!wget https://onti2020.ai-academy.ru/task/rucos_val.jsonl
!wget https://onti2020.ai-academy.ru/task/rucos_train.jsonl.zip
!unzip rucos_train.jsonl.zip
!pip install nltk
import nltk
nltk.download('all')
from nltk.tokenize import word_tokenize
def get_train_data(jsonfile):
res=[]
with open(jsonfile, 'r') as data:
trainlist=list(data)
for item in tqdm(trainlist):
item=json.loads(item)
dictt={}
dictt['context']=word_tokenize(item['passage']['text'])
qas=[]
qlist=item['qas']
for q in qlist:
qdict={}
qdict['id']=str(q['idx']).rjust(6, '0')
answers=[]
qdict['is_impossible']=True
qdict['question']=q['query']
alist=q['answers']
for a in alist:
adict={}
adict['text']=a['text']
adict['answer_start']=a['start']
answers.append(adict)
qdict['answers']=answers
qas.append(qdict)
dictt['qas']=qas
res.append(dictt)
return res
traindata, evaldata=get_train_data('rucos_train.jsonl'), get_train_data('rucos_val.jsonl')
建筑模型:
!pip install simpletransformers
!pip install torch==1.5.0
from simpletransformers.question_answering import QuestionAnsweringModel, QuestionAnsweringArgs
model = QuestionAnsweringModel(
"bert",
"bert-base-multilingual-uncased",
args=QuestionAnsweringArgs(n_best_size=2)
)
模型训练:
model.train(traindata, output_dir='/content/drive/MyDrive')
此代码在 Colab Pro 中执行并基于文档https://simpletransformers.ai/docs/qa-model/。
请帮我解决这个问题。