python - 频率计数的错误结果

Question

我试图在另一个单词列表中找到一个单词列表的出现。我的代码如下所示：

for cat, text2 in posts:
words=wordpunct_tokenize(text2)
for word in words:
    if word in top:
        counter[word]+=1

print counter

单词看起来像这样：[("Post1", "post1" ,"post1"), ("post2","post2), ("post3")] top 看起来像这样"Post1, Post2, Post3" 预期的结果是：

{post1: 3}
{post2, 2}
{post3, 1}

但是我现在得到的输出是：

{'post1': 3})
{'post2': 2, 'post1': 3})
{'post3': 1, 'post2': 2, 'post1': 3})

看起来程序将上一行中的单词添加到下一行中，有人知道我该如何解决这个问题吗？

score 2 · Accepted Answer

A hint: there's a class in Python that already does what you want, it's called Counter and it's in the collections module:

from collections import Counter
c = Counter()
for cat, text2 in posts:
    c.update(word for word in wordpunct_tokenize(text2) if word in top)

At the end, the c variable will contain the frequency count of the words found.

1 回答 1