python - Python NLTK：从斯坦福依赖解析结果中提取词法头项

Question

我有一个句子，我想提取词头项目，我可以使用斯坦福 NLP 库进行依赖解析。

我怎样才能在句子中提取主要头部？

在句子的情况下Download and share this tool，头部将是Download。

我尝试了以下方法：

 def get_head_word(text):
     standepparse=StanfordDependencyParser(path_to_jar='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser.jar',path_to_models_jar='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser-3.4-models.jar',model_path='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser-3.4-models/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
     parsetree=standepparse.raw_parse(text)
     p_tree=list(parsetree)[0]
     print p_tree.to_dot()

 text = 'Download and share this tool'
 get_head_word(text)


output:

digraph G{
edge [dir=forward]
node [shape=plaintext]

0 [label="0 (None)"]
0 -> 1 [label="root"]
1 [label="1 (Download)"]
1 -> 2 [label="cc"]
1 -> 3 [label="conj"]
1 -> 5 [label="dobj"]
2 [label="2 (and)"]
3 [label="3 (share)"]
4 [label="4 (this)"]
5 [label="5 (software)"]
5 -> 4 [label="det"]
}

score 1 · Accepted Answer

要找到句子的依赖头，只需查找其head值指向该节点的root节点。在NLTKAPI to DependencyGraph中，您可以轻松查找其头部指向字典第一个索引的节点。

请注意，在依赖解析中，与典型的 chomsky 范式/CFG 解析树不同，依赖解析可能不止一个头。

但是由于您将依赖项输出转换为 Tree 结构，您可以执行以下操作：

tree_head = next(n for n in p_tree.node_values() if n['head'] == 1)

但请注意，在语言上，句子中的中心词Download and share this tool应该是Download and share。但是在计算上，一棵树是分层的，一棵正常形式的树会有，ROOT->Download->and->share但一些解析器也可能产生这棵树：ROOT->and->Download;share

python - Python NLTK：从斯坦福依赖解析结果中提取词法头项

1 回答 1

Related

Reference