1

我正在研究 NER 应用程序,其中我有以下数据格式的数据注释。

[('The F15 aircraft uses a lot of fuel', {'entities': [(4, 7, 'aircraft')]}),
 ('did you see the F16 landing?', {'entities': [(16, 19, 'aircraft')]}),
 ('how many missiles can a F35 carry', {'entities': [(24, 27, 'aircraft')]}),
 ('is the F15 outdated', {'entities': [(7, 10, 'aircraft')]}),
 ('how long does it take to train a F16 pilot',{'entities': [(33, 36, 'aircraft')]}),
 ('how much does a F35 cost', {'entities': [(16, 19, 'aircraft')]})]

有没有办法将其转换为 CONLL 2003 格式?

4

1 回答 1

1

你的意思是哪种 CoNLL 格式?

您可以通过执行以下操作获得简单的 CoNLL 格式:

import spacy

data = ... your data ...

nlp = spacy.blank("en")

for text, labels in data:
    doc = nlp(text)
    ents = []
    for start, end, label in labels["entities"]:
        ents.append(doc.char_span(start, end, label))
    doc.ents = ents
    for tok in doc:
        label = tok.ent_iob_
        if tok.ent_iob_ != "O":
            label += '-' + tok.ent_type_
        print(tok, label, sep="\t")

还有一个库spacy_conll可以为您执行此操作。

于 2021-11-24T05:35:54.360 回答