python - 将自定义 NER 模型添加到 spaCy 管道

Question

我使用 Prodi.gy 创建了一个自定义 NER 模型。执行完所有处理和验证后，我将模型保存到磁盘。我可以使用 spacy.load 从磁盘实例化模型，它似乎运行良好。我现在的问题是如何将该自定义 NER 模型添加到 spacy 管道？我想确保我在管道中拥有标记器、解析器等以及我的自定义 NER 模型。

看来我应该从现有模型之一（en_core_web_sm）初始化一个基本 nlp，删除现有的 NER，然后用我的自定义 NER 替换它。这无疑是用户错误，我似乎无法从文档和试验/错误中找出我做错了什么（或需要做什么）。

也许我的操作是错误的？也许我应该尝试将标记器和解析器添加到我的自定义模型实例化中？

I was able to get it to work by adding the "tagged" and "parser" from one of the en models and then modifying the meta.json file. That doesn't seem like the right approach.

我试过这个显然不对：

nlp = spacy.load("en_core_web_sm")
#remove existing NER
nlp.remove_pipe('ner')
print("Pipeline", nlp.pipe_names)

nlp_entity = spacy.load("custom_ner_model")

nlp.add_pipe(nlp_entity)
print("Pipeline", nlp.pipe_names)

Pipeline ['tagger', 'parser']
Pipeline ['tagger', 'parser', 'English']

然后我尝试从自定义模型构建NER并添加它，但也不正确：

nlp = spacy.load("en_core_web_sm")
#remove existing NER
nlp.remove_pipe('ner')
print("Pipeline", nlp.pipe_names)

nlp_entity = spacy.load("custom_ner_model")
ner = nlp_entity.create_pipe("ner")

nlp.add_pipe(ner,last=True)
print("Pipeline", nlp.pipe_names)

如果我尝试在管道中使用 ner 运行，则会出错：

text = "This is a test"
doc = nlp(text)
displacy.render(doc, style="ent")

ValueError: [E109] Model for component 'ner' not initialized. Did you forget to load a model, or forget to call begin_training()?

也得到了这个错误，这就是促使我尝试从基本 en 模型添加标记器/解析器的原因

ValueError: [E155] The pipeline needs to include a tagger in order to use Matcher or PhraseMatcher with the attributes POS, TAG, or LEMMA. Try using nlp() instead of nlp.make_doc() or list(nlp.pipe()) instead of list(nlp.tokenizer.pipe()).

score 2 · Accepted Answer

在 spaCy v2 中：

nlp = spacy.load("en_core_web_sm", disable=["ner"])
nlp_entity = spacy.load("custom_ner_model", vocab=nlp.vocab)
nlp.add_pipe(nlp_entity.get_pipe("ner"))

这里棘手的部分是您需要使用相同的词汇加载两者，以便您的最终模型知道仅在自定义模型中使用的任何新标签的字符串。为此，您只需提供从第一个模型到spacy.load()第二个模型的词汇对象。

对于即将推出的 spaCy v3，这将更改为：

nlp = spacy.load("en_core_web_sm", exclude=["ner"])
nlp_entity = spacy.load("custom_ner_model")
nlp.add_pipe("ner", source=nlp_entity)

score 0 · Accepted Answer

spacy 的人提供了这个作为回应，这类似于@aab 的回答。

您可以使用基本模型进行训练并移除 ner：

nlp = spacy.load("en_core_web_sm")
nlp.remove_pipe("ner")
print(nlp.pipe_names)  # ['tagger', 'parser']
nlp.to_disk("./en_tagger_parser_sm")  # use that path for training

或者您可以从基础模型中删除 NER，并将您的自定义 NER 添加到该基础：

nlp = spacy.load("en_core_web_sm")
nlp.remove_pipe("ner")
print(nlp.pipe_names)  # ['tagger', 'parser']

nlp_entity = spacy.load("custom_ner_model")
# Get the ner pipe from this model and add it to base model
ner = nlp_entity.get_pipe("ner")
nlp.add_pipe(ner)
print(nlp.pipe_names)  # ['tagger', 'parser', 'ner']

nlp.to_disk("./custom_model")

python - 将自定义 NER 模型添加到 spaCy 管道

2 回答 2

Related

Reference