我一直在尝试在我的 Java 程序中使用 Stanford Parser 来解析一些中文句子。由于我对 Java 和斯坦福解析器都很陌生,因此我使用了“ParseDemo.java”来练习。该代码适用于英语句子并输出正确的结果。但是,当我将模型更改为“chinesePCFG.ser.gz”并尝试解析一些分段的中文句子时,出现了问题。
这是我的Java代码
class ParserDemo {
public static void main(String[] args) {
LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz");
if (args.length > 0) {
demoDP(lp, args[0]);
} else {
demoAPI(lp);
}
}
public static void demoDP(LexicalizedParser lp, String filename) {
// This option shows loading and sentence-segment and tokenizing
// a file using DocumentPreprocessor
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
// You could also create a tokenier here (as below) and pass it
// to DocumentPreprocessor
for (List<HasWord> sentence : new DocumentPreprocessor(filename)) {
Tree parse = lp.apply(sentence);
parse.pennPrint();
System.out.println();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection tdl = gs.typedDependenciesCCprocessed(true);
System.out.println(tdl);
System.out.println();
}
}
public static void demoAPI(LexicalizedParser lp) {
// This option shows parsing a list of correctly tokenized words
String sent[] = { "我", "是", "一名", "学生" };
List<CoreLabel> rawWords = Sentence.toCoreLabelList(sent);
Tree parse = lp.apply(rawWords);
parse.pennPrint();
System.out.println();
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
System.out.println(tdl);
System.out.println();
TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
tp.printTree(parse);
}
private ParserDemo() {} // static methods only
}
它与 ParserDemo.java 基本相同,但是当我运行它时,我得到以下结果:
从序列化文件 edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz 加载解析器...完成 [2.2 秒]。(ROOT(IP(NP(PN我))(VP(VC是)(NP(QP(CD来))(NP(NN学生))))))
线程“主”java.lang.RuntimeException 中的异常:无法在 edu.stanford.nlp.trees.GrammaticalStructureFactory.newGrammaticalStructure(GrammaticalStructureFactory) 调用公共 edu.stanford.nlp.trees.EnglishGrammaticalStructure(edu.stanford.nlp.trees.Tree) .java:104) 在 parserdemo.ParserDemo.demoAPI(ParserDemo.java:65) 在 parserdemo.ParserDemo.main(ParserDemo.java:23)
第 65 行的代码是:
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
我的猜测是 chinesePCFG.ser.gz 遗漏了与“edu.stanford.nlp.trees.EnglishGrammaticalStructure”相关的内容。由于解析器通过命令行正确解析中文,所以我自己的代码肯定有问题。我一直在寻找,只是找到了一些类似的案例,其中一些提到了使用正确的模型,但我真的不知道如何将代码修改为“正确的模型”。希望有人可以帮助我。我是 Java 和斯坦福解析器的新手,所以请具体说明。谢谢!