python - 无法让 Counter() 在 python 中工作

Question

我正在尝试制作一个计数器，它使用 POS trigrams 列表来检查大量 trigrams 并找到它们的频率。到目前为止，我的代码如下：

from nltk import trigrams
from nltk.tokenize import wordpunct_tokenize
from nltk import bigrams
from collections import Counter
import nltk
text= ["This is an example sentence."]
trigram_top= ['PRP', 'MD', 'VB']

   for words in text:
      tokens = wordpunct_tokenize (words)
      tags = nltk.pos_tag (tokens)
      trigram_list=trigrams(tags)
      list_tri=Counter (t for t in trigram_list if t in trigram_top)
      print list_tri

我得到一个空柜台回来。我该如何修复这个？在较早的版本中，我确实取回了数据，但它一直在迭代计数（在实际程序中，文本是不同文件的集合）。有人有想法吗？

score 2 · Accepted Answer

让我们放一些print在那里调试：

from nltk import trigrams
from nltk.tokenize import wordpunct_tokenize
from nltk import bigrams
from collections import Counter
import nltk
text= ["This is an example sentence."]
trigram_top= ['PRP', 'MD', 'VB']

for words in text:
    tokens = wordpunct_tokenize (words)
    print tokens
    tags = nltk.pos_tag (tokens)
    print tags
    list_tri=Counter (t[0] for t in tags if t[1] in trigram_top)
    print list_tri

#['This', 'is', 'an', 'example', 'sentence', '.']
#[('This', 'DT'), ('is', 'VBZ'), ('an', 'DT'), ('example', 'NN'), ('sentence', 'NN'), ('.', '.')]
#Counter()

请注意，该list=部分是多余的，我已将生成器更改为仅使用单词而不是 pos 标签

我们可以看到没有一个 pos 标签直接匹配您的 trigram_top - 您可能需要修改比较检查以适应 VB/VBZ...

一种可能性是改变这条线：

list_tri=Counter (t[0] for t in tags if t[1].startswith(tuple(trigram_top)))
# Counter({'is': 1})

python - 无法让 Counter() 在 python 中工作

1 回答 1

Related

Reference