python - Python 查找最频繁的代码

Question

我想阅读一个文件并找到最常用的单词。以下是代码。我假设阅读文件我犯了一些错误。任何建议将不胜感激。

txt_file = open('result.txt', 'r')

for line in txt_file:
    for word in line.strip().split():
        word = word.strip(punctuation).lower()

    all_words = nltk.FreqDist(word for word in word.words())
    top_words = set(all_words.keys()[:300])
    print top_words

输入result.txt文件

Musik to shiyuki miyama opa samba japan obi Musik Musik Musik 
Antiques    antique 1900 s sewing pattern pictorial review size Musik 36 bust 1910 s ladies waist bust

score 1 · Accepted Answer

我不确定你的错误是什么，也不知道如何用 NLTK 来做，但是你循环行的方法，然后可以调整单词以使用简单的 python 字典来跟踪计数：

txt_file = open("filename", "r")
txt_file.readLines()

wordFreq = {}
for line in txt_file:
    for word in line.strip().split():
        word = word.strip(punctuation).lower()
        # If word is already in dict, increase count
        if word in wordFreq:
            wordFreq[word] += 1
        else:    #Otherwise, add word to dict and initialize count to 1
            wordFreq[word] = 1

要查询结果，只需将感兴趣的单词作为键，即 dict wordFreq['Musik']。

score 1 · Accepted Answer

from collections import Counter
txt_file = open('result.txt', 'r')
words = [word for line in txt_file for word in line.strip().split()]
print Counter(words).most_common(1)

代替1in most_common，您可以给出任何数字，并且将显示大量最常用的数据。例如

print Counter(words).most_common(1)

结果是

[('Musik', 5)]

然而

print Counter(words).most_common(5)

给

[('Musik', 5), ('bust', 2), ('s', 2), ('antique', 1), ('ladies', 1)]

数字实际上是一个可选参数，如果你省略它，它将按降序给出所有单词的频率。

python - Python 查找最频繁的代码

2 回答 2

Related

Reference