2

我正在尝试编写一个程序来计算 txt 文件中最常见的 5 个单词。

这是我到目前为止所拥有的:

file = open('alice.txt')
wordcount = {}

for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1

for k, v in wordcount.items():
    print (k, v)

该程序按原样计算 .txt 文件中的每个单词。

我的问题是如何使它只计算文件中最常见的 5 个单词,以便在每个单词旁边显示单词和字数。

一个问题-我不能使用字典……不管那是什么意思。

4

3 回答 3

1

很简单,您只需要在文件中找到 5 个最常用的单词

所以你可以做这样的事情:

wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)

然后,这个字典将按值排序(记住sorted返回一个列表)。

您可以使用以下代码获取 5 个最常用的单词:

for k, v in wordcount[:5]):
    print (k, v)

所以完整的代码如下:

wordcount = {}

with open('alice.txt') as file:  # with can auto close the file
    for word in file.read().split():
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1

wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)

for k, v in wordcount[:5]:
    print(k, v)

此外,这是一种更简单的方法来使用 use collections.Counter

from collections import Counter
with open('alice.txt') as file:  # with can auto close the file
    wordcount = Counter(file.read().split())

for k, v in wordcount.most_common(5):
    print(k, v)

输出与第一个解决方案相同。

于 2015-10-28T07:16:26.543 回答
1
File_Name = 'file.txt'

counterDict = {}

with open(File_Name,'r') as fh:
    #Reading  all lines into a list.
    data = fh.readlines()

for line in data:
    # Removing some characters like '.' , ',' 
    # Changing all case into lower. 
    line = line.lower().replace(',','').replace('.','')
    # Splitting all words into list elements.
    words = line.split()
    for word in words:
        # Add the word into counterDict if  it is not present.
        # key should be 1.
        if word not in counterDict:
            counterDict[word] = 1
        #If the word is already in the counterDict, then increase its count by one.
        else:
            counterDict[word] = counterDict[word] + 1    

# The sorting will be based on word count.
# Eg : lambda x = (word,count) = x[0] = word , x[1]=count
sorted_counterDict = sorted(counterDict.items(), reverse=True , key=lambda x : x[1])

#sorted_counterDict[0:5] , print first five.
for key,val in sorted_counterDict[0:5]:
    print(key,val)
于 2017-02-20T01:59:34.570 回答
0

有一个内置函数可以按键对字典进行排序:

sorted(wordcount, reverse=True)

现在由您决定如何仅获取/打印前五个元素;)

注意:当然sorted也能够对其他集合进行排序。

于 2015-10-28T07:01:10.567 回答