java - 使用标准 corenlp 包获取 corefrences

Question

我正在尝试在文本中获得共同引用。我是 corenlp 包的新手。我尝试了下面的代码，但它不起作用，但我也对其他方法持开放态度。

/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */

package corenlp;
import edu.stanford.nlp.ling.CoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.CorefGraphAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.semgraph.SemanticGraph;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.IntTuple;
import edu.stanford.nlp.util.Pair;
import edu.stanford.nlp.util.Timing;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import java.util.Properties;
/**
 *
 * @author Karthi
 */
public class Main {


        // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
    Properties props = new Properties();
    FileInputStream in = new FileInputStream("Main.properties");

    props.load(in);
    in.close();
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // read some text in the text variable
    String text = "The doctor can consult with other doctors about this patient. If that is the case, the name of the doctor and the names of the consultants have to be maintained. Otherwise, only the name of the doctor is kept. "; // Add your text here!

    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);

    // run all Annotators on this text
    pipeline.annotate(document);
    System.out.println(document);
    // these are all the sentences in this document
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
    List<CoreMap> sentences = (List<CoreMap>) document.get(SentencesAnnotation.class);
    System.out.println(sentences);
    for(CoreMap sentence: sentences) {
      // traversing the words in the current sentence
      // a CoreLabel is a CoreMap with additional token-specific methods
      for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
        // this is the text of the token
        String word = token.get(TextAnnotation.class);
        // this is the POS tag of the token
        String pos = token.get(PartOfSpeechAnnotation.class);
        // this is the NER label of the token
        String ne = token.get(NamedEntityTagAnnotation.class);
      }

      // this is the parse tree of the current sentence
      Tree tree = sentence.get(TreeAnnotation.class);
System.out.println(tree);
      // this is the Stanford dependency graph of the current sentence
      SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
      System.out.println(dependencies);
    }

    // this is the coreference link graph
    // each link stores an arc in the graph; the first element in the Pair is the source, the second is the target
    // each node is stored as <sentence id, token id>. Both offsets start at 1!
    List<Pair<IntTuple, IntTuple>> graph = document.get(CorefGraphAnnotation.class);
    System.out.println(graph);

    }

}

这是我得到的错误：

Loading POS Model [// For POS model] ... Loading default properties from trained tagger // For POS model
Error: No such trained tagger config file found.
java.io.FileNotFoundException: \\ For POS model (The specified path is invalid)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileInputStream.<init>(FileInputStream.java:66)
        at edu.stanford.nlp.tagger.maxent.TaggerConfig.getTaggerDataInputStream(TaggerConfig.java:741)
        at edu.stanford.nlp.tagger.maxent.TaggerConfig.<init>(TaggerConfig.java:178)
        at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:228)
        at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:57)
        at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:44)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:441)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:434)
        at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:62)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:309)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:347)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:337)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:329)
        at corenlp.Main.main(Main.java:66)
Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: \\ For POS model (The specified path is invalid)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:443)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:434)
        at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:62)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:309)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:347)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:337)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:329)
        at corenlp.Main.main(Main.java:66)
Caused by: java.io.FileNotFoundException: \\ For POS model (The specified path is invalid)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileInputStream.<init>(FileInputStream.java:66)
        at edu.stanford.nlp.tagger.maxent.TaggerConfig.getTaggerDataInputStream(TaggerConfig.java:741)
        at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:643)
        at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:268)
        at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:228)
        at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(POSTaggerAnnotator.java:57)
        at edu.stanford.nlp.pipeline.POSTaggerAnnotator.<init>(POSTaggerAnnotator.java:44)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP$4.create(StanfordCoreNLP.java:441)
        ... 7 more
Java Result: 1

score 2 · Accepted Answer

这个错误仅仅意味着程序没有找到它需要运行的数据模型。他们需要在你的类路径上。如果您在分发目录中，则可以使用以下命令执行此操作：

java -cp stanford-corenlp-2010-11-12.jar:stanford-corenlp-models-2010-11-06.jar:xom.jar:jgrapht.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file input.txt

第二个罐子包含模型。如果您使用的是 Windows，请将上面的冒号替换为分号。

score 1 · Accepted Answer

我做了以下事情：

使用上面提供的进口 Tex
检查 Main.properties 文件是否在可访问路径中。正如卡尔蒂所说。
替换 <List<Pair<IntTuple, IntTuple>> graph = document.get(CorefGraphAnnotation.class); 为Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
我得到的结果如下。够了吗？

{1=CHAIN1-[第 1 句中的“医生”，第 2 句中的“医生”，第 3 句中的“医生”]，2=CHAIN2-[第 1 句中的“关于该患者的其他医生”]，4= C HAIN4-[第 1 句中的“这个病人”]，5=CHAIN5-[第 2 句中的“那个”，第 2 句中的“病例”]，7=CHAIN7-[“医生姓名和患者姓名"consulters" in sentence 2, "only the name of the doctor" in sentence 3], 9=CHAIN9-["the doctor and the names of the advisors" in sentence 2], 11=CHAIN11-["the names of the advisors" " 第 2 句], 13=CHAIN13-[第 2 句中的“顾问”]}

score 0 · Accepted Answer

自本主题和此处的代码段中的版本以来，文件的结构似乎略有变化：http: //nlp.stanford.edu/software/corenlp.shtml

将导入替换为：

import edu.stanford.nlp.trees.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefGraphAnnotation;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.trees.semgraph.SemanticGraph;
import edu.stanford.nlp.dcoref.CorefChain

这对我有用；-)

java - 使用标准 corenlp 包获取 corefrences

3 回答 3

Related

Reference