python - 在 Python 中使用 NLTK

Question

我一直在尝试Python NLTK Book中的一些示例。例如，第 7 章用这个例子讨论了 Chinking：

grammar = r"""
    NP:
    {<.*>+}          # Chunk everything
    }<VBD|IN>+{      # Chink sequences of VBD and IN
  """
sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),
       ("dog", "NN"), ("barked", "VBD"), ("at", "IN"),  ("the", "DT"), ("cat", "NN")]
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)

据我说，这应该从结果中剔除“吠叫”。但事实并非如此。我是 python 和 nltk 的新手，但是我在这里缺少什么？这里有什么明显需要更新的地方吗？谢谢..

score 0 · Accepted Answer

chunking 会创建块，而 chinking 会分解这些块。

这正是 Jacob Perkins 的“Python Text Processing with NLTK 2.0 Cookbook”所说的（我建议您阅读这本书，因为您是 NLTK 新手）。

这意味着 {} 创建了一些块，而 }{ 将这些块分解成更小的块（即分离它们），但不会删除任何东西。

根据您的示例，查看显示的内容

result.draw()

或者运行

from nltk.tree import Tree

Tree('S', [Tree('NP', [('the', 'DT'), ('little', 'JJ'), ('yellow', 'JJ'), ('dog', 'NN')]), ('barked', 'VBD'), ('at', 'IN'), Tree('NP', [('the', 'DT'), ('cat', 'NN')])]).draw()

（上面的代码示例显示相同的内容。不同之处在于第一个需要您运行初始示例，而第二个不需要任何内容）

python - 在 Python 中使用 NLTK

1 回答 1

Related

Reference