1

我正在尝试使用大约 530 Mb 的数据构建自定义 NER。我使用以下代码使用 simpletransformers 来实现它。

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
from simpletransformers.ner import NERModel,NERArgs
import os

label = ['B-ORG', 'I-ORG', 'B-PER', 'I-PER']

args = NERArgs()
args.num_train_epochs = 10
args.learning_rate = 0.001
args.overwrite_output_dir =True
args.train_batch_size = 32
args.eval_batch_size = 32
args.lazy_loading = True

model = NERModel('roberta', 'roberta-base',labels=label, args =args, use_cuda=True)

model.train_model('a.txt',eval_data = 'b.txt', acc = accuracy_score)

args.lazy_loading=True用来解决内存问题。但是给出以下错误:

TypeError: convert_example_to_feature() missing 14 required positional arguments: 'label_map', 'max_seq_length', 'tokenizer', 'cls_token_at_end', 'cls_token', 'cls_token_segment_id', 'sep_token', 'sep_token_extra', 'pad_on_left', 'pad_token', 'pad_token_segment_id', 'pad_token_label_id', 'sequence_a_segment_id', and 'mask_padding_with_zero'

CoNLL 格式的示例输入文本:

a B-PER
b I-PER
c I-PER

f B-ORG
g I-ORG
h I-ORG

参考链接:https ://simpletransformers.ai/docs/ner-specifics/

4

0 回答 0