我正在使用 OpenNLP 从句子中提取专有名词。这是我的代码:
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.HashSet;
import java.util.Set;
import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.parser.ParserModel;
public class ParserTest {
static Set<String> nounPhrases = new HashSet<>();
private static String line = "iran india pai oil due euro delhi iran ask indian refin essar oil mangalor refineri petrochem mrpl clear oil due amount billion euro month lift sanction iran told indian author three year mechan pai cent oil import bill rupe keep remain cent pend payment channel clear end.";
public void getNounPhrases(Parse p) {
if (p.getType().equals("NNP") || p.getType().equals("NNPS")) {
nounPhrases.add(p.getCoveredText());
System.out.println(p.getCoveredText());
}
for (Parse child : p.getChildren()) {
getNounPhrases(child);
}
}
public void parserAction() throws Exception {
InputStream is = new FileInputStream("C:\\Users\\asus\\Downloads\\en-parser-chunking.bin");
ParserModel model = new ParserModel(is);
Parser parser = ParserFactory.create(model);
Parse topParses[] = ParserTool.parseLine(line, parser, 1);
for (Parse p : topParses){
//p.show();
getNounPhrases(p);
}
}
public static void main(String[] args) throws Exception {
new ParserTest().parserAction();
System.out.println("List of Noun Parse : "+nounPhrases);
}
}
问题是它是一个词干文本(我使用了波特词干算法),所以每个单词都是小写的。因此,专有名词没有被提取。我上面提取专有名词的方法是否正确?如果是,那么我必须在代码中进行哪些更改才能使其正常工作?如果不是,那么建议我一种新方法以及示例代码将帮助我做到这一点。
谢谢你。