nlp - 如何训练自定义模型 opennlp？

Question

我想训练我自己的自定义模型。我可以从哪里开始？

我正在使用此示例数据来训练模型：

<START:meaningless>Took connection and<END>  selected the Text in the Letter Template and cleared the Formatting of Text to Normal.

基本上我想从给定的输入中识别出一些无意义的文本。

我尝试使用 opennlp 开发文档中给出的示例代码，但出现错误：模型与名称查找器不兼容！

    Charset charset = Charset.forName("UTF-8");
ObjectStream<String> lineStream =
        new PlainTextByLineStream(new FileInputStream("mynewmodel.train"), charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);

TokenNameFinderModel model;

try {
  model = NameFinderME.train("en", "meaningless", sampleStream,
      Collections.<String, Object>emptyMap(), 100, 5);
}
finally {
  sampleStream.close();
}

try {
  modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
  model.serialize(modelOut);
} finally {
  if (modelOut != null) 
     modelOut.close();      
}

score 0 · Accepted Answer

可能的问题：您没有向培训师提供明确标记的文本。如果我正确理解文档，PlainTextByLineStream 需要空格分隔的标记。所以

<START:meaningless> Took connection and <END>

而不是

<START:meaningless>Took connection and<END>

nlp - 如何训练自定义模型 opennlp？

1 回答 1

Related

Reference