stanford-nlp - 在斯坦福 CoreNLP 中强制使用 POS 标签

Question

有没有办法使用 Stanford CoreNLP 处理已经带有 POS 标记的文本？

例如，我有这种格式的句子

They_PRP are_VBP hunting_VBG dogs_NNS ._.

我想通过强制给定的 POS 注释用引理、ner、解析等进行注释。

更新。我试过这段代码，但它不工作。

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma"); 

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String sentText = "They_PRP are_VBP hunting_VBG dogs_NNS ._.";
List<CoreLabel> sentence = new ArrayList<>();

String[] parts = sentText.split("\\s");
for (String p : parts) {
    String[] split = p.split("_");
    CoreLabel clToken = new CoreLabel();
    clToken.setValue(split[0]);
    clToken.setWord(split[0]);
    clToken.setOriginalText(split[0]);
    clToken.set(CoreAnnotations.PartOfSpeechAnnotation.class, split[1]);
    sentence.add(clToken);
}
Annotation s = new Annotation(sentText);
s.set(CoreAnnotations.TokensAnnotation.class, sentence);

Annotation document = new Annotation(s);
pipeline.annotate(document);

score 0 · Accepted Answer

如果您pos在管道中包含注释器，那么 POS 注释肯定会被替换。

相反，删除pos注释器并添加选项-enforceRequirements false。这将允许管道运行，即使不存在lemma等依赖的pos注释器（注释器）。在管道实例化之前添加以下行：

props.setProperty("enforceRequirements", "false");

当然，如果您在没有设置正确注释的情况下冒险进入该区域，则行为是不确定的，因此请确保您匹配相关注释器所做的注释（POSTaggerAnnotator在这种情况下）。

stanford-nlp - 在斯坦福 CoreNLP 中强制使用 POS 标签

1 回答 1

Related

Reference