我正在尝试使用 OpenNLP 用葡萄牙语创建解析器分块。但我没有成功。
我认为创建模型需要两个文件:
一个带有 train.all 扩展名和格式的培训文件:
(TOP (S (NP-SBJ (DT Some) )(VP (VBP say) (NP (NNP November) ))(. .) )) (TOP (S (NP-SBJ (PRP I) )(VP (VBP say) (NP (CD 1992) ))(. .) ('' '') ))
一个带有 headRules 名称的规则文件。我的文件包含从互联网上获取的这些规则。
20 ADJP 0 NNS QP NN $ ADVP JJ VBN VBG ADJP JJR NP JJS DT FW RBR RBS SBAR RB 15 ADVP 1 RB RBR RBS FW ADVP TO CD JJR JJ IN NP JJS NN 5 CONJP 1 CC RB IN 2 FRAG 1 2 INTJ 0 4 LST 1 LS : 19 NAC 0 NN NNS NNP NNPS NP NAC EX $ CD QP PRP VBG JJ JJS JJR ADJP FW 8 PP 1 IN TO VBG VBN RP FW 2 PRN 1 3 PRT 1 RP
我使用此命令生成模型 en-parser-chunking.bin
$ Bin / OpenNLP ParserTrainer -encoding ISO-8859-1 -lang en -parserType CHUNKING -headRules head_rules -data train.all -model en-parser-chunking.bin
所以,我使用这个模型 en-parser-chunking.bin 来处理解析器。按照代码:
ParserModel modelParse = new ParserModel(parserStream);
Parser parser = ParserFactory.create(modelParse);
Parse Parses [] = ParserTool.parseLine ("Some say Novembro", parser, 1);
运行代码时,出现以下错误:
SEVERE: Servlet.service() for servlet [DispatcherServlet] in context with path [/ProjetoCMBuilder] threw exception [Request processing failed; nested exception is java.lang.ArrayIndexOutOfBoundsException: -1] with root cause
java.lang.ArrayIndexOutOfBoundsException: -1
at opennlp.tools.parser.treeinsert.Parser.advanceParses(Parser.java:346)
at opennlp.tools.parser.AbstractBottomUpParser.parse(AbstractBottomUpParser.java:311)
at opennlp.tools.parser.AbstractBottomUpParser.parse(AbstractBottomUpParser.java:365)
at opennlp.tools.cmdline.parser.ParserTool.parseLine(ParserTool.java:77)
at ProjetoCMBuilder.service.LibOpenNLPServiceBean.processarTexto(LibOpenNLPServiceBean.java:78)
at ProjetoCMBuilder.controller.ProcessController.getAllPropositionPt(ProcessController.java:142)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
如何创建模型 parser-chunking.bin?