0

我确实浏览了下面链接的可用文档。但是事情仍然不清楚,我应该如何进行?我遵循了正确的输入训练格式,但出现如下错误。

cmd命令:

./opennlp ChunkerTrainerME -model hn-chunker.bin -lang hn -data sampletrain.txt -encoding UTF-8

错误:

Skipping corrupt line: इसके PRP NP
Skipping corrupt line: साथ  NST NP
Skipping corrupt line: ही   RP  NP
Skipping corrupt line: पार्टी   NN  NP2
Skipping corrupt line: ने   PSP NP2
Skipping corrupt line: सरकार    NN  NP3
Skipping corrupt line: से   PSP NP3
Skipping corrupt line: इस   DEM NP4
Skipping corrupt line: मसले NN  NP4
Skipping corrupt line: पर   PSP NP4
Skipping corrupt line: बयान NN  NP5
Skipping corrupt line: देने VM  VGNN
Skipping corrupt line: की   PSP VGNN
Skipping corrupt line: मांग NN  NP6
Skipping corrupt line: की   VM  VGF
Skipping corrupt line: है   VAUX    VGF

done. 0 events
Indexing...  done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...  
Exception in thread "main" java.lang.NullPointerException

at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:184)
at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
at opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTool.java:68)
at opennlp.tools.cmdline.CLI.main(CLI.java:222)

参考: http: //opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.chunker

4

1 回答 1

0

我不确定您是否仍在寻找答案,但我发现问题出在您的训练文本文件中,它只需要单词和标签之间的空格。标签之间可能有多个空格。

例如:跳过损坏的行:पार्टी NN NP2

  1. पार्टी和NN之间有3个空格。
  2. NN 和 NP2 之间有 3 个空格。
于 2016-03-30T19:13:49.543 回答