我试图检测一个句子是主动还是被动。为此,我正在使用斯坦福 CoreNLP 并注意依赖项“nsubj”(=主动)或“nsubjpass”(=被动)。
这非常适用于英语(代码在这里,如果您有兴趣),输出如下:
输出:
Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Reading POS tagger model from lib/stanford-postagger-full-2013-06-20/models/english-left3words-distsim.tagger ... done [1,2 sec].
Adding annotator lemma
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [1,1 sec].
reln: det
reln: nsubjpass <-- yeah! All I want. Passive sentence detected!
reln: auxpass
reln: root
reln: det
reln: prep_for
但是,我现在也想使用德语并为此更改以下几行:
Properties props = new Properties();
props.put("parse.flags", "");
props.put("pos.model", "lib/stanford-postagger-full-2013-06-20/models/german-fast.tagger");
props.put("annotators", "tokenize, ssplit, pos, lemma, parse");
props.put("parse.model", "edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz"); <--- not there
这失败了,因为 jar (stanford-corenlp-3.2.0-models.jar) 中没有文件解析模型“germanPCFG.ser.gz” - 只有英文。网上有我可以包含的德语解析模型(例如,参见这个),但随后我得到了大量的堆栈跟踪。
Loading parser from serialized file lib/stanford-postagger-full-2013-06-20/germanFactored.ser.gz ...
java.lang.NullPointerException
at edu.stanford.nlp.parser.lexparser.BinaryGrammar.init(BinaryGrammar.java:224)
at edu.stanford.nlp.parser.lexparser.BinaryGrammar.readObject(BinaryGrammar.java:211)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:172)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromSerializedFile(LexicalizedParser.java:607)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromFile(LexicalizedParser.java:401)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:158)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:144)
at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:177)
at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:107)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$12.create(StanfordCoreNLP.java:736)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:260)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:127)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:123)
at nlp.Tagger.parse(Tagger.java:83)
at nlp.GUI$5.doInBackground(GUI.java:474)
at nlp.GUI$5.doInBackground(GUI.java:468)
at javax.swing.SwingWorker$1.call(SwingWorker.java:277)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at javax.swing.SwingWorker.run(SwingWorker.java:316)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Loading parser from text file lib/stanford-postagger-full-2013-06-20/germanFactored.ser.gz java.lang.RuntimeException: lib/stanford-postagger-full-2013-06-20/germanFactored.ser.gz: expecting BEGIN block; got ��
如果我只将英语解析模型 (englishPCFG.ser.gz) 用于德语输入,则无法正确检测到德语被动句。关于如何继续的任何建议?