dependencies - 斯坦福核心 NLP 缺少根

Question

来自在线演示斯坦福 CoreNLP的示例句子“可以单独测试的最小软件项目”，它给出了 CC 处理的折叠依赖项，如下所示：

root ( ROOT-0 , item-4 )
det ( item-4 , A-1 )
amod ( item-4 , minimal-2 )
nn ( item-4 , software-3 )
nsubjpass ( tested-8 , that-5 )
aux ( tested-8 , can-6 )
auxpass ( tested-8 , be-7 )
rcmod ( item-4 , tested-8 )
prep_in ( tested-8 , isolation-10 )

从我的 Java 类中，除了 root(...) 之外，我得到了相同的结果。我正在运行的代码如下：

public static void main(String[] args)
    {
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        Annotation document = new Annotation(args[0]);

        pipeline.annotate(document);

        List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

        for (CoreMap sentence : sentences) {
            SemanticGraph dependencies = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
            System.out.println(dependencies.toList());
        }
    }

所以问题是为什么我的Java代码不输出root`s！？我错过了什么吗？

score 3 · Accepted Answer

这是一个很好的问题，因为它暴露了当前代码的缺陷。目前，根节点和它的边不存储在图中。* 相反，它们必须作为图的根/列表单独访问，存储为单独的列表。这里有两件事会起作用：（1）在上面添加这段代码System.out.println：

IndexedWord root = dependencies.getFirstRoot();
System.out.printf("ROOT(root-0, %s-%d)%n", root.word(), root.index());

(2) 使用而不是当前行：

System.out.println(dependencies.toString("readable"));

与其他toList()ortoString()方法不同，它确实打印了根。

*这有历史原因：我们过去没有任何明确的根。但在这一点上，这种行为是尴尬和功能失调的，应该改变。它可能会在未来的版本中发生。

dependencies - 斯坦福核心 NLP 缺少根

1 回答 1

Related

Reference