8

我有一句话John saw a flashy hat at the store
如何将其表示为如下所示的依赖树?

(S
      (NP (NNP John))
      (VP
        (VBD saw)
        (NP (DT a) (JJ flashy) (NN hat))
        (PP (IN at) (NP (DT the) (NN store)))))

我从这里得到了这个脚本

import spacy
from nltk import Tree
en_nlp = spacy.load('en')

doc = en_nlp("John saw a flashy hat at the store")

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_


[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]

我得到以下信息,但我正在寻找一种树(NLTK)格式。

     saw                 
  ____|_______________    
 |        |           at 
 |        |           |   
 |       hat        store
 |     ___|____       |   
John  a      flashy  the
4

2 回答 2

7

要为 SpaCy 依赖项解析重新创建 NLTK 样式树,请尝试使用draw方法 fromnltk.tree而不是pretty_print

import spacy
from nltk.tree import Tree

spacy_nlp = spacy.load("en")

def nltk_spacy_tree(sent):
    """
    Visualize the SpaCy dependency tree with nltk.tree
    """
    doc = spacy_nlp(sent)
    def token_format(token):
        return "_".join([token.orth_, token.tag_, token.dep_])

    def to_nltk_tree(node):
        if node.n_lefts + node.n_rights > 0:
            return Tree(token_format(node),
                       [to_nltk_tree(child) 
                        for child in node.children]
                   )
        else:
            return token_format(node)

    tree = [to_nltk_tree(sent.root) for sent in doc.sents]
    # The first item in the list is the full tree
    tree[0].draw()

请注意,由于 SpaCy 目前仅支持单词和名词短语级别的依赖项解析和标记,因此 SpaCy 树的结构不会像您从斯坦福解析器中获得的那样深入结构化,您还可以将其可视化作为一棵树:

from nltk.tree import Tree
from nltk.parse.stanford import StanfordParser

# Note: Download Stanford jar dependencies first
# See https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk
stanford_parser = StanfordParser(
    model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"
)

def nltk_stanford_tree(sent):
    """
    Visualize the Stanford dependency tree with nltk.tree
    """
    parse = stanford_parser.raw_parse(sent)
    tree = list(parse)
    # The first item in the list is the full tree
    tree[0].draw()

现在,如果我们同时运行两者,nltk_spacy_tree("John saw a flashy hat at the store.")将生成此图像并将nltk_stanford_tree("John saw a flashy hat at the store.")生成图像。

于 2017-12-07T18:16:17.857 回答
3

除了文本表示之外,您要实现的是从依赖图中获取选区树。您所需输出的示例是经典选区树(如短语结构语法,而不是依赖语法)。

虽然从选区树到依赖图的转换或多或少是一项自动化任务(例如,http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf),但另一个方向不是。已经有这方面的工作,请查看 PAD 项目https://github.com/ikekonglp/PAD和描述底层算法的论文:http: //homes.cs.washington.edu/~nasmith/papers/kong+ rush+smith.naacl15.pdf

您可能还想重新考虑是否真的需要选区解析,这是一个很好的论点:https ://linguistics.stackexchange.com/questions/7280/why-is-constituency-needed-since-dependency-gets-the-更轻松地完成工作

于 2017-04-13T08:00:35.460 回答