基本上我想要的是用实际实体替换文本中的所有代词。
// Path to the folder with models extracted from `stanford-corenlp-3.7.0-models.jar`
var jarRoot = ...
// Text for processing
var text = "Kosgi Santosh sent an email to Stanford University. He didn't get a reply.";
// Annotation pipeline configuration
var props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
props.setProperty("ner.useSUTime", "0");
// We should change current directory, so StanfordCoreNLP could find all the model files automatically
var curDir = Environment.CurrentDirectory;
Directory.SetCurrentDirectory(jarRoot);
var pipeline = new StanfordCoreNLP(props);
Directory.SetCurrentDirectory(curDir);
// Annotation
var annotation = new Annotation(text);
pipeline.annotate(annotation);
var graph = annotation.get(new CorefChainAnnotation().getClass());
Console.WriteLine(graph);
到目前为止,我只能找到如何“漂亮地打印”它,但我想进一步处理来自“graph”的结果,但我不知道如何实际解析来自“annotation.get(new CorefChainAnnotation().获取类())”。在 Java 中,据说它会返回一个 Map < Integer, CorefChain >,但我不知道它应该如何在 C# 中工作。
你有什么想法?