nltk - NLTK 子句和短语分解

Question

有没有办法让 NLTK 返回完全标记有所有 Treebank 子句和 Treebank 短语分界的文本（或等效项；它不必是 Treebank）？我需要能够（分别）返回子句和短语。我发现的唯一一件事是在第 7 章的 NLTK Bird/Klein/Loper 书中，它说您不能同时处理名词短语和动词短语，但我想做的远不止这些！我认为斯坦福 POS 解析器会这样做，但客户只想使用 NLTK。谢谢。

score 1 · Accepted Answer

你看过第八章了吗？听起来你想要这样的东西：

>>> from nltk.corpus import treebank
>>> t = treebank.parsed_sents('wsj_0001.mrg')[0]
>>> print t
(S
  (NP-SBJ
    (NP (NNP Pierre) (NNP Vinken))
    (, ,)
    (ADJP (NP (CD 61) (NNS years)) (JJ old))
    (, ,))
  (VP
    (MD will)
    (VP
      (VB join)
      (NP (DT the) (NN board))
      (PP-CLR
        (IN as)
        (NP (DT a) (JJ nonexecutive) (NN director)))
      (NP-TMP (NNP Nov.) (CD 29))))
  (. .))

除了您已经找到的分块资源。但是，如果您的意思是要解析您提供的文本，还有以下选项：

>>> sr_parse = nltk.ShiftReduceParser(grammar1)
>>> sent = 'Mary saw a dog'.split()
>>> print sr_parse.parse(sent)
(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))

但这依赖于预先手动填充的语法1。分块比解析更容易。

nltk - NLTK 子句和短语分解

1 回答 1

Related

Reference