依赖路径是一种描述从句如何在句子中构建的方式。SpaCy 在他们的文档中有一个非常好的例子,有句子Apple is looking at buying U.K. startup for $1 billion.
请原谅我在这里缺乏良好的可视化,但要通过你的例子来工作:
A misty ridge uprises from the surge.
在 spaCy 中,我们按照他们的示例来获取依赖项:
import spacy
nlp = spacy.load('en_core_web_lg')
doc = nlp("A misty ridge uprises from the surge.")
for chunk in doc.noun_chunks:
print(chunk.text, chunk.root.text, chunk.root.dep_, chunk.root.head.text)
这将得到构成您句子的“从句”。您的输出将如下所示:
Text | root.text| root.dep_ | root.head.text
A misty ridge uprises uprises ROOT uprises
the surge surge pobj from
chunk.text
是构成依赖子句的文本(注意,根据句子结构可能会有重叠)。root.text
给出依赖树的根(或头). 树的head
是一个 spaCytoken
对象,并且具有可以遍历的子对象,以检查依赖关系树上是否有另一个令牌。
def find_dependencies(doc, word_to_check=None, dep_choice=None):
"""
word_to_check is the word you'd like to see on the dependency tree
example, word_to_check="misty"
dep_choice is the text of the item you'd like the dependency check
to be against. Example, dep_choice='ridge'
"""
tokens, texts = [], []
for tok in doc:
tokens.append(tok)
texts.append(tok.text)
# grabs the index/indices of the token that you are interested in
indices = [i for i,text in enumerate(texts) if text==dep_choice]
words_in_path = []
for i in indices:
reference = tokens[i]
child_elements = [t.text for t in reference.get_children()]
if word_to_check in child_elements:
words_in_path.append((word_to_check, reference))
return words_in_path
该代码不是最漂亮的,但这是一种获取包含要检查的单词与关联的父标记的元组列表的方法。希望这会有所帮助
编辑:
为了对您的用例进行更多定制(并大大简化我的原始答案):
# This will give you 'word':<spaCy doc object> key value lookup capability
tokens_lookup = {tok.text:tok for tok in doc}
if "misty" in tokens_lookup.get("ridge").children:
# Extra logic here