3

我正在尝试让斯坦福解析器为我的德语文本管道工作,但它拒绝使用德语解析器:

Properties props = new Properties();

props.put("annotators", "tokenize, ssplit, pos, parse");
props.put("ssplit.isOneSentence", "true");
props.put("pos.model", "pos-taggers/german-fast/german-fast.tagger");
props.put("pos.maxlen", "30");
props.put("parse.model", "edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz");
props.put("encoding", "utf-8");

pipeline = new StanfordCoreNLP(props);

我仍然得到以下输出,仅此而已,因为无法识别德语标签:

Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
Initializing lexicon scores ... The 15 open class tags are: [ TRUNC NE NN XY VVIZU ADV VVINF VVFIN VVPP CARD NN-OA ADJA FM ADJD NN-SB ] 

故障跟踪:

java.lang.IllegalArgumentException: Unknown option: -retainTmpSubcategories
at edu.stanford.nlp.parser.lexparser.Options.setOption(Options.java:175)
at edu.stanford.nlp.parser.lexparser.Options.setOptions(Options.java:68)
at edu.stanford.nlp.parser.lexparser.Options.setOptions(Options.java:49)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.setOptionFlags(LexicalizedParser.java:841)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:159)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:143)
at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:176)
at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:106)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$12.create(StanfordCoreNLP.java:734)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:261)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:127)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:123)
at da.utils.nlp.SentimentExtractor.initPipeline(SentimentExtractor.java:111)
at da.utils.nlp.SentimentExtractor.coreAnnotate(SentimentExtractor.java:117)
at da.utils.nlp.SentimentExtractorTest.testCoreAnnotate(SentimentExtractorTest.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

知道我的实施可能有什么问题吗?

我检查了文件位置,但没有成功。

4

1 回答 1

2

简单(如果令人困惑)的答案应该是您只需要在属性设置中添加这一行:

props.put("parse.flags", "");

(这应该是固定的,但是标志默认为一个在获取英语依赖项时有用的选项,但在其他语言中不相关或不可用,因此您会收到上面的错误消息。)

但是,如果这是唯一的问题,您应该首先看到它加载德语解析器,然后再给出长错误转储,如下所示:

Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/germanFactored.ser.gz ... done [5.2 sec].
Exception in thread "main" java.lang.IllegalArgumentException: Unknown option: -retainTmpSubcategories

但是在您显示的输出中,它仍在加载英文解析器。所以肯定有其他问题。我不确定这部分,但有两种可能性:

  • 您正在运行旧版本的斯坦福 CoreNLP。不久前,这些选项被称为“parser.model”、“parser.flags”等,但为了保持一致,我们将它们重命名。
  • 您没有edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz在 CLASSPATH 上调用的资源
于 2013-10-20T00:31:30.720 回答