13

我是自然语言处理的新手。我需要从文本中提取名词短语。到目前为止,我已经使用 open nlp 的分块解析器来解析我的文本以获取树结构。但我无法从树形结构,open nlp 中是否有任何正则表达式模式,以便我可以使用它来提取名词短语。

下面是我正在使用的代码

    InputStream is = new FileInputStream("en-parser-chunking.bin");
    ParserModel model = new ParserModel(is);
    Parser parser = ParserFactory.create(model);
    Parse topParses[] = ParserTool.parseLine(line, parser, 1);
        for (Parse p : topParses){
                 p.show();}

在这里,我得到的输出为

(TOP (S (S (ADJP (JJ 欢迎)) (PP (TO) (NP (NNP Big) (NNP Data.))))) (S (NP (PRP We)) (VP (VP (VBP)) (VP (VBG working) (PP (IN on)) (NP (NNP Natural) (NNP Language) (NNP Processing.can))))) (NP (DT some) (CD one) (NN help)) (NP ( PRP us)) (PP(IN in) (S(VP(VBG提取)) (NP(DT the) (NN名词) (NNS词组)) (PP(IN from) (NP(DT the) (NN tree)) ( WP结构。))))))))))

有人可以帮我获得像 NP、NNP、NN 等名词短语吗?有人可以告诉我是否需要使用任何其他 NP Chunker 来获得名词短语?是否有任何正则表达式模式可以实现相同的目的。

请帮助我。

提前致谢

古斯。

4

3 回答 3

6

对象是一Parse棵树;您可以使用getParent()andgetChildren()getType()来导航树。

List<Parse> nounPhrases;

public void getNounPhrases(Parse p) {
    if (p.getType().equals("NP")) {
         nounPhrases.add(p);
    }
    for (Parse child : p.getChildren()) {
         getNounPhrases(child);
    }
}
于 2013-03-23T07:19:46.333 回答
4

如果您只想要名词短语,请使用句子分块器而不是树解析器。代码是这样的(您需要从获得解析器模型的同一位置获取模型)

public void chunk() {
    InputStream modelIn = null;
    ChunkerModel model = null;

    try {
      modelIn = new FileInputStream("en-chunker.bin");
      model = new ChunkerModel(modelIn);
    }
    catch (IOException e) {
      // Model loading failed, handle the error
      e.printStackTrace();
    }
    finally {
      if (modelIn != null) {
        try {
          modelIn.close();
        }
        catch (IOException e) {
        }
      }
    }

//After the model is loaded a Chunker can be instantiated.


    ChunkerME chunker = new ChunkerME(model);



    String sent[] = new String[]{"Rockwell", "International", "Corp.", "'s",
      "Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
      "extending", "its", "contract", "with", "Boeing", "Co.", "to",
      "provide", "structural", "parts", "for", "Boeing", "'s", "747",
      "jetliners", "."};

    String pos[] = new String[]{"NNP", "NNP", "NNP", "POS", "NNP", "NN",
      "VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
      "NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
      "."};

    String tag[] = chunker.chunk(sent, pos);
  }

然后查看您想要的类型的标签数组

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.chunking.api
于 2013-12-16T00:49:54.377 回答
2

将从您的代码本身继续。该程序块将提供句子中的所有名词短语。使用getTagNodes()方法获取 Tokens 及其类型

Parse topParses[] = ParserTool.parseLine(line, parser, 1);
Parse words[]=null; //an array to store the tokens
//Loop thorugh to get the tag nodes
for (Parse nodes : topParses){
        words=nodes.getTagNodes(); // we will get a list of nodes
}

for(Parse word:words){
//Change the types according to your desired types
    if(word.getType().equals("NN") || word.getType().equals("NNP") || word.getType().equals("NNS")){
            System.out.println(word);
            }
        }
于 2016-11-09T04:49:41.893 回答