python - 使用 SpaCy Displacy 可视化定制的 NER 标签

Question

我是 spaCy 和 Python 的新手，我想使用这个库来可视化一个 NER。这是我找到的示例：

import spacy
from spacy import displacy

NER = spacy.load("en_core_web_sm")

raw_text="The Indian Space Research Organisation or is the national space agency of India, headquartered in Bengaluru. It operates under Department of Space which is directly overseen by the Prime Minister of India while Chairman of ISRO acts as executive of DOS as well."

text1= NER(raw_text)

displacy.render(text1,style="ent",jupyter=True)

可视化的例子

但是，我已经有一个自定义标签列表及其位置：

 [812, 834, "POS"], [838, 853, "ORG"], [870, 888, "POS"], [892, 920, "ORG"], [925, 929, "ENGLEVEL"], [987, 1002, "SKILL"],...

我希望使用我自己的自定义标签和实体来可视化我的文本，而不是 spaCy 的默认 NER 选项。我怎样才能做到这一点？

score 2 · Accepted Answer

您将需要添加表示实体的字符跨度并将它们附加到您的文档对象。像这样的东西：

import spacy
from spacy import displacy

nlp = spacy.blank('en')
raw_text = "The Indian Space Research Organisation or is the national space agency of India, headquartered in Bengaluru. It operates under Department of Space which is directly overseen by the Prime Minister of India while Chairman of ISRO acts as executive of DOS as well."
doc = nlp.make_doc(raw_text)
spans = [[812, 834, "POS"], [838, 853, "ORG"], [870, 888, "POS"], [892, 920, "ORG"], [925, 929, "ENGLEVEL"],
         [987, 1002, "SKILL"]]
ents = []
for span_start, span_end, label in spans:
    ent = doc.char_span(span_start, span_end, label=label)
    if ent is None:
        continue

    ents.append(ent)

doc.ents = ents
displacy.render(doc, style="ent", jupyter=True)

相应地更改您的raw_text和spans。如果您给出的跨度开始或结束超出文本doc.char_span()返回的长度，None那么您需要适当地处理它。

python - 使用 SpaCy Displacy 可视化定制的 NER 标签

1 回答 1

Related

Reference