我想在我的 openNlp SentenceDetectorME 中更改句尾分隔符。我正在使用 opennlp 1.5.3。由于普通版本只检测以'.'分隔的短语,我的目的是添加其他句子分隔符,如';','!' 和 '?',将 char 数组 eos[] 传递给 SentenceDetectorFactory。我读到您必须使用 .train 方法 SentenceDetectorME,但我不明白如何,因为它是静态的并且需要训练模型。有什么建议么?
我的代码:
import java.io.*;
import opennlp.tools.sentdetect.*;
public class SenTest {
public static void main(String[] args) throws IOException {
String paragraph = "12oz bottle poured into a tulip. Pleasing aromas of citrus rind, lemongrass, peaches, and toasted caramel are picked up from the start. After it settles a bit, more of a fresh baked bread crust and tangerine comes through, and even later, the bread crust turns more towards a blackened pizza crust. It pours a slightly hazy copper-orange color with a creamy white head that retains well; it leaves a thick puffy ring with a creamy island and a decent, messy lace along the glass. Great balance between medium high levels of sweet and bitter. The texture is creamy on the palate with a body towards the higher end of medium. The carbonation is a touch effervescent or fizzy, but overall, soft. There’s a very pronounced grapefruit tartness up front, but it mellows quickly after the first few sips. It finishes with a zesty combination of lemongrass, caramel, and stonefruit. The aftertaste is primarily sweet, overripe tangerines and it’s peel with a tart grapefruit bitter lingering in the mouth. Overall very refreshing, straddles the line between IPA and APA.";
char eos[] = {';', '.', '!', '?' };
int counter = 0;
// always start with a model, a model is learned from training data
InputStream is = new FileInputStream( System.getProperty( "user.dir" ) + "/lib/en-sent.bin" );
SentenceModel model = new SentenceModel( is );
SentenceDetectorME sdetector = new SentenceDetectorME( model );
String sentences[] = sdetector.sentDetect( paragraph );
for ( String s : sentences ) {
counter++;
System.out.println( "Frase numero " + counter + ": " + s );
}
is.close();
}
}