Flair
使用BILUO
方案,句子之间有空行,所以你需要使用bliuo_tags_from_offsets
:
import spacy
from spacy.gold import biluo_tags_from_offsets
nlp = spacy.load("en_core_web_md")
ents = [("George Washington went to Washington",{'entities': [(0, 6,'PER'),(7, 17,'PER'),(26, 36,'LOC')]}),
("Uber blew through $1 million a week", {'entities':[(0, 4, 'ORG')]}),
]
with open("flair_ner.txt","w") as f:
for sent,tags in ents:
doc = nlp(sent)
biluo = biluo_tags_from_offsets(doc,tags['entities'])
for word,tag in zip(doc, biluo):
f.write(f"{word} {tag}\n")
f.write("\n")
输出:
George U-PER
Washington U-PER
went O
to O
Washington U-LOC
Uber U-ORG
blew O
through O
$ O
1 O
million O
a O
week O
请注意,仅训练这NER
一点似乎就足够了。如果您希望添加 pos 标记,则需要创建从Universal Pos Tags到 Flair 简化方案的映射。例如:
tag_mapping = {'PROPN':'N','VERB':'V','ADP':'P','NOUN':'N'} # create your own
with open("flair_ner.txt","w") as f:
for pair in ents:
sent,tags = pair
doc = nlp(sent)
biluo = biluo_tags_from_offsets(doc,tags['entities'])
try:
for word,tag in zip(doc, biluo):
f.write(f"{word} {tag_mapping[word.pos_]} {tag}\n")
# f.write(f"{word} {tag_mapping.get(word.pos_,'None')} {tag}\n")
except KeyError:
print(f"''{word.pos_}' tag is not defined in tag_mapping")
f.write("\n")
输出:
''SYM' tag is not defined in tag_mapping'