java-7 - 使用 Stanford CorNLP 手动标记单词

Question

我有一个资源，我确切地知道单词的类型。我必须对它们进行词形还原，但为了获得正确的结果，我必须手动标记它们。我找不到任何用于手动标记单词的代码。我使用以下代码，但它返回错误的结果。即“绘画”为“绘画”，我期望“绘画”。

*//...........lemmatization starts........................

Properties props = new Properties(); 
props.put("annotators", "tokenize, ssplit, pos, lemma"); 
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting"; 
Annotation document = pipeline.process(text);  

List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);

for(edu.stanford.nlp.util.CoreMap sentence: sentences) 

{    
    for(CoreLabel token: sentence.get(TokensAnnotation.class))
    {       
        String word = token.get(TextAnnotation.class);      
        String lemma = token.get(LemmaAnnotation.class); 
        System.out.println("lemmatized version :" + lemma);
    }
}

//...........lemmatization ends.........................*

我必须在单词上运行 lemmatizer，而不是在自动完成 pos 标记的句子上运行。所以我会首先手动标记单词，然后找到它们的引理。一些示例代码的帮助或对某些站点的引用将是很大的帮助。

score 1 · Accepted Answer

如果您事先知道 POS 标签，您可以通过以下方式获取词条：

Properties props = new Properties(); 
props.put("annotators", "tokenize, ssplit"); 
StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
String text = "painting";

Morphology morphology = new Morphology();

Annotation document = pipeline.process(text);  

List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);

for(edu.stanford.nlp.util.CoreMap sentence: sentences) {

  for(CoreLabel token: sentence.get(TokensAnnotation.class)) {       
    String word = token.get(TextAnnotation.class);
    String tag = ... //get the tag for the current word from somewhere, e.g. an array
    String lemma = morphology.lemma(word, tag);
    System.out.println("lemmatized version :" + lemma);
  }
}

如果您只想获得单个单词的引理，您甚至不必运行 CoreNLP 来进行分词和分句，因此您只需调用引理函数，如下所示：

String tag = "VBG";      
String word = "painting";
Morphology morphology = new Morphology();
String lemma = morphology.lemma(word, tag);

java-7 - 使用 Stanford CorNLP 手动标记单词

1 回答 1

Related

Reference