9

I'm using the stanford core NLP and I use this line to load some modules to process my text:

props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");

Is ther a module that i can load to chunks the text?

Or any suggestion with a alterantive way to use the stanford core to chunk some text?

Thank you

4

4 回答 4

5

要在斯坦福 NLP 中使用分块,您可以使用以下软件包:

  • YamCha:基于 SVM 的 NP-chunker,也可用于 POS 标记、NER 等。C/C++ 开源。赢得 CoNLL 2000 共享任务。(不像最终用户的专用 POS 标记器那么自动化。)
  • Mark Greenwood 的名词短语分块器:Ramshaw 和 Marcus 的 Java 重新实现 (1995)。
  • fnTBL:在 C++ 中快速灵活地实现基于转换的学习。包括一个词性标注器,还包括 NP 分块和一般分块模型。

资料来源: http ://www-nlp.stanford.edu/links/statnlp.html#NPchunk

于 2013-04-23T02:07:50.977 回答
5

我认为解析器输出可用于获取 NP 块。查看Stanford Parser 网站上提供示例输出的无上下文表示。

于 2012-11-13T01:20:08.207 回答
1

您需要的是 CoreNLP 中选区解析的输出,它为您提供块的信息,例如动词短语 (VPs、) 名词短语 (NPs) 等。但据我所知,CoreNLP 中没有方法可以提供你的块列表。这意味着您必须解析选区解析的实际输出以提取块。

例如,这是 CoreNLP 的 constituency parser 对一个例句的输出:

(ROOT (S ("" "") (NP (NNP Anarchism)) (VP (VBZ is) (NP (NP (DT a) (JJ political) (NN philosophy)) (SBAR (WHNP (WDT that)) (S (VP (VBZ advocates) (NP (NP (JJ self-governed) (NNS societies)) (VP (VBN based) (PP (IN on) (NP (JJ voluntary) (, ,) (JJ cooperative) (NNS institutions))))))))) (, ,) (S (VP (VBG rejecting) (NP (JJ unjust) (NN hierarchy))))) (. .)))

如您所见,字符串中有 NP 和 VP 标签,现在您必须通过解析此字符串来提取块的实际文本。让我知道您是否可以找到一种可以为您提供块列表的方法?!

于 2019-05-12T03:15:15.733 回答
0

扩展 Pedram 的答案,可以使用以下代码:

from nltk.parse.corenlp import CoreNLPParser
nlp = CoreNLPParser('http://localhost:9000')  # Assuming CoreNLP server is running locally at port 9000


def extract_phrase(trees, labels):
    phrases = []
    for tree in trees:
        for subtree in tree.subtrees():
            if subtree.label() in labels:
                t = subtree
                t = ' '.join(t.leaves())
                phrases.append(t)
    return phrases


def get_chunks(sentence):
    trees = next(nlp.raw_parse(sentence))
    nps = extract_phrase(trees, ['NP', 'CC'])
    vps = extract_phrase(trees, ['VP'])
    return trees, nps, vps


if __name__ == '__main__':
    dialog = [
        "Anarchism is a political philosophy that advocates self-governed societies based on voluntary cooperative institutions rejecting unjust hierarchy"
    ]
    for sentence in dialog:
        trees, nps, vps = get_chunks(sentence)
        print("\n\n")
        print("Sentence: ", sentence)
        print("Tree:\n", trees)
        print("Noun Phrases: ", nps)
        print("Verb Phrases: ", vps)

"""
Sentence:  Anarchism is a political philosophy that advocates self-governed societies based on voluntary cooperative institutions rejecting unjust hierarchy
Tree:
 (ROOT
  (S
    (NP (NN Anarchism))
    (VP
      (VBZ is)
      (NP
        (NP (DT a) (JJ political) (NN philosophy))
        (SBAR
          (WHNP (WDT that))
          (S
            (VP
              (VBZ advocates)
              (NP
                (ADJP (NN self) (HYPH -) (VBN governed))
                (NNS societies))
              (PP
                (VBN based)
                (PP
                  (IN on)
                  (NP
                    (NP
                      (JJ voluntary)
                      (JJ cooperative)
                      (NNS institutions))
                    (VP
                      (VBG rejecting)
                      (NP (JJ unjust) (NN hierarchy)))))))))))))
Noun Phrases:  ['Anarchism', 'a political philosophy that advocates self - governed societies based on voluntary cooperative institutions rejecting unjust hierarchy', 'a political philosophy', 'self - governed societies', 'voluntary cooperative institutions rejecting unjust hierarchy', 'voluntary cooperative institutions', 'unjust hierarchy']
Verb Phrases:  ['is a political philosophy that advocates self - governed societies based on voluntary cooperative institutions rejecting unjust hierarchy', 'advocates self - governed societies based on voluntary cooperative institutions rejecting unjust hierarchy', 'rejecting unjust hierarchy']

"""
于 2021-06-09T13:06:51.997 回答