python - 有没有办法在 spaCy 中使用根标记来检索整个名词块？

Question

我对使用 spaCy 很陌生。我已经阅读了几个小时的文档，但我仍然很困惑是否可以做我的问题。反正...

正如标题所说，有没有一种方法可以使用包含它的标记来实际获取给定的名词块。例如，给定句子：

"Autonomous cars shift insurance liability toward manufacturers"

"autonomous cars"当我只有"cars"令牌时，是否有可能获得名词块？这是我正在尝试的场景的示例片段。

startingSentence = "Autonomous cars and magic wands shift insurance liability toward manufacturers"
doc = nlp(startingSentence)
noun_chunks = doc.noun_chunks

for token in doc:
    if token.dep_ == "dobj":
        print(child) # this will print "liability"

        # Is it possible to do anything from here to actually get the "insurance liability" token?

任何帮助将不胜感激。谢谢！

score 2 · Accepted Answer

您可以通过检查标记是否在名词块跨度之一中轻松找到包含您已识别的标记的名词块：

doc = nlp("Autonomous cars and magic wands shift insurance liability toward manufacturers")
interesting_token = doc[7] # or however you identify the token you want
for noun_chunk in doc.noun_chunks:
    if interesting_token in noun_chunk:
        print(noun_chunk)

en_core_web_sm 和 spacy 2.0.18 的输出不正确，因为shift没有被识别为动词，所以你得到：

魔杖转移保险责任

使用 en_core_web_md，它是正确的：

保险责任

（在文档中包含具有真正歧义的示例是有意义的，因为这是一个现实的场景（https://spacy.io/usage/linguistic-features#noun-chunks），但如果新用户足够模糊的话，这会让他们感到困惑跨版本/模型的分析不稳定。）

python - 有没有办法在 spaCy 中使用根标记来检索整个名词块？

1 回答 1

Related

Reference