我正在尝试使用大约 530 Mb 的数据构建自定义 NER。我使用以下代码使用 simpletransformers 来实现它。
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
from simpletransformers.ner import NERModel,NERArgs
import os
label = ['B-ORG', 'I-ORG', 'B-PER', 'I-PER']
args = NERArgs()
args.num_train_epochs = 10
args.learning_rate = 0.001
args.overwrite_output_dir =True
args.train_batch_size = 32
args.eval_batch_size = 32
args.lazy_loading = True
model = NERModel('roberta', 'roberta-base',labels=label, args =args, use_cuda=True)
model.train_model('a.txt',eval_data = 'b.txt', acc = accuracy_score)
我args.lazy_loading=True
用来解决内存问题。但是给出以下错误:
TypeError: convert_example_to_feature() missing 14 required positional arguments: 'label_map', 'max_seq_length', 'tokenizer', 'cls_token_at_end', 'cls_token', 'cls_token_segment_id', 'sep_token', 'sep_token_extra', 'pad_on_left', 'pad_token', 'pad_token_segment_id', 'pad_token_label_id', 'sequence_a_segment_id', and 'mask_padding_with_zero'
CoNLL 格式的示例输入文本:
a B-PER
b I-PER
c I-PER
f B-ORG
g I-ORG
h I-ORG