我正在使用斯坦福 CoreNLP(01.2016 版),我想在依赖关系中保留标点符号。当你从命令行运行它时,我找到了一些方法,但我没有找到任何关于提取依赖关系的 java 代码。
这是我当前的代码。它有效,但不包括标点符号:
Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.put("parse.model", modelPath );
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn,
"-maxLength", "200", "-retainTmpSubcategories");
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);
Tree parse = lp.apply(words);
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection<TypedDependency> td = gs.typedDependencies();
parsedText += td.toString() + "\n";
任何类型的依赖关系对我来说都可以,基本的、键入的、折叠的等。我只想包括标点符号。
提前致谢,