python - 用 spaCy 判断一个词是否在两个实体的依赖路径上

Question

我正在研究一个 nlp 问题，给定一个包含两个实体的句子，我需要为每个单词生成布尔值，指示它是否位于这些实体之间的依赖路径上。

例如：

'一个朦胧的<e1>山脊</e1>从<e2>浪涌</e2>中升起'

我想迭代每个单词并判断它是否在 e1 和 e2 之间的依赖路径上

两个重要说明：

- 如果您尝试帮助我（首先感谢），请不要费心考虑带有 < e1 > 和 < e2 > 的 xml 标记，我真的很感兴趣如何查找一个单词是否在任何两个给定单词之间的依赖路径上使用 spaCy，我自己处理哪些单词

- 因为我不是nlp专家，所以我对“依赖路径”的含义有点困惑，如果不够清楚，我很抱歉（这些是我导师使用的词）

提前致谢

score 3 · Accepted Answer

所以我的解决方案是使用那个帖子找到的

有一个专门针对 spaCy 的答案

我在给定句子中查找两个单词之间的依赖路径的实现：

import networkx as nx
import spacy
enter code here
doc = nlp("Ships carrying equipment for US troops are already waiting off the Turkish coast")
    
def shortest_dependency_path(doc, e1=None, e2=None):
    edges = []
    for token in doc:
        for child in token.children:
            edges.append(('{0}'.format(token),
                          '{0}'.format(child)))
    graph = nx.Graph(edges)
    try:
        shortest_path = nx.shortest_path(graph, source=e1, target=e2)
    except nx.NetworkXNoPath:
        shortest_path = []
    return shortest_path

print(shortest_dependency_path(doc,'Ships','troops'))

输出：

['Ships', 'carrying', 'for', 'troops']

它实际上所做的是首先为句子构建一个无向图，其中单词是节点，单词之间的依赖关系是边，然后找到两个节点之间的最短路径

为了我的需要，我只是检查每个单词是否在生成的依赖路径（最短路径）上

score 2 · Accepted Answer

依赖路径是一种描述从句如何在句子中构建的方式。SpaCy 在他们的文档中有一个非常好的例子，有句子Apple is looking at buying U.K. startup for $1 billion.

请原谅我在这里缺乏良好的可视化，但要通过你的例子来工作：

A misty ridge uprises from the surge.

在 spaCy 中，我们按照他们的示例来获取依赖项：

import spacy
nlp = spacy.load('en_core_web_lg')
doc = nlp("A misty ridge uprises from the surge.")
for chunk in doc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_, chunk.root.head.text)

这将得到构成您句子的“从句”。您的输出将如下所示：

Text                  | root.text| root.dep_ | root.head.text
A misty ridge uprises   uprises    ROOT        uprises
the surge               surge      pobj        from

chunk.text是构成依赖子句的文本（注意，根据句子结构可能会有重叠）。root.text给出依赖树的根（或头）. 树的head是一个 spaCytoken对象，并且具有可以遍历的子对象，以检查依赖关系树上是否有另一个令牌。

def find_dependencies(doc, word_to_check=None, dep_choice=None):
    """
    word_to_check is the word you'd like to see on the dependency tree
    example, word_to_check="misty"

    dep_choice is the text of the item you'd like the dependency check
    to be against. Example, dep_choice='ridge'
    """
    tokens, texts = [], []

    for tok in doc:
        tokens.append(tok)
        texts.append(tok.text)

    # grabs the index/indices of the token that you are interested in
    indices = [i for i,text in enumerate(texts) if text==dep_choice]

    words_in_path = []

    for i in indices:

        reference = tokens[i]
        child_elements = [t.text for t in reference.get_children()]
        if word_to_check in child_elements:
            words_in_path.append((word_to_check, reference))

    return words_in_path

该代码不是最漂亮的，但这是一种获取包含要检查的单词与关联的父标记的元组列表的方法。希望这会有所帮助

编辑：

为了对您的用例进行更多定制（并大大简化我的原始答案）：

# This will give you 'word':<spaCy doc object> key value lookup capability
tokens_lookup = {tok.text:tok for tok in doc}

if "misty" in tokens_lookup.get("ridge").children:
    # Extra logic here

python - 用 spaCy 判断一个词是否在两个实体的依赖路径上

2 回答 2

Related

Reference