java - 使用 Stanford Parser(CoreNLP) 查找短语头

Question

我将使用 Stanford Corenlp 2013 来查找短语头。我看到了这个线程。

但是，我并不清楚答案，我无法添加任何评论来继续该线程。所以，我很抱歉重复。

我目前拥有的是句子的解析树（使用 Stanford Corenlp）（我也尝试过由 Stanford Corenlp 创建的 CONLL 格式）。而我需要的正是名词短语的头部。

我不知道如何使用依赖项和解析树来提取名词短语的头部。我所知道的是，如果我有nsubj (x, y)， y 是主题的头部。如果我有dobj(x,y)， y 是直接对象的头部。f I have iobj(x,y), y 是间接宾语的头部。

但是，我不确定这种方法是否是找到所有短语头的正确方法。如果是，我应该添加哪些规则来获取所有名词短语的头部？

也许，值得一提的是，我需要 Java 代码中的名词短语的头部。

score 8 · Accepted Answer

由于我无法评论 Chaitanya 给出的答案，所以在这里添加更多他的答案。

斯坦福 CoreNLP 套件实现了柯林斯头部查找器启发式和语义头部查找器启发式，形式为

柯林斯寻头人
ModCollins 寻头器
语义寻头器

您所需要的只是实例化三者之一并执行以下操作。

Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
headFinder.determineHead(tree).pennPrint(out);

您可以遍历树的节点并在需要时确定中心词。

PS：我的回答是基于 20140104 发布的 StanfordCoreNLP 套件。

这是一个简单的 dfs，可让您提取句子中所有名词短语的中心词

public static void dfs(Tree node, Tree parent, HeadFinder headFinder) {
      if (node == null || node.isLeaf()) {
         return;
      }
      //if node is a NP - Get the terminal nodes to get the words in the NP      
      if(node.value().equals("NP") ) {

         System.out.println(" Noun Phrase is ");
         List<Tree> leaves = node.getLeaves();

         for(Tree leaf : leaves) {
            System.out.print(leaf.toString()+" ");

         }
         System.out.println();

         System.out.println(" Head string is ");
         System.out.println(node.headTerminal(headFinder, parent));

    }

    for(Tree child : node.children()) {
         dfs(child, node, headFinder);
    }

 }

score 4 · Accepted Answer

您可以提取感兴趣的短语，使其成为Tree类的对象。然后，您可以从任何实现HeadFinder接口的类中使用determineHead(Tree t)方法。

java - 使用 Stanford Parser(CoreNLP) 查找短语头

2 回答 2

Related

Reference