python - 如何遍历字典python中的所有键？

Question

我应该计算文档“个人文章”中所有文件中字典“d”的所有键值的频率这里，文档“个人文章”有大约20000个txt文件，文件名1,2， 3,4... 例如：假设 d[Britain]=[5,76,289] 必须返回英国在属于文档“个人文章”的文件 5.txt,76.txt,289.txt 中出现的次数，而且我还需要在同一文档中的所有文件中找到它的频率。

import collections
import sys
import os
import re
sys.stdout=open('dictionary.txt','w')
from collections import Counter
from glob import glob


folderpath='d:/individual-articles'
counter=Counter()


filepaths = glob(os.path.join(folderpath,'*.txt'))

def words_generator(fileobj):
    for line in fileobj:
        for word in line.split():
            yield word
word_count_dict = {}
for file in filepaths:
    f = open(file,"r")
    words = words_generator(f)
    for word in words:
        if word not in word_count_dict:
              word_count_dict[word] = {"total":0}
        if file not in word_count_dict[word]:
              word_count_dict[word][file] = 0
        word_count_dict[word][file] += 1              
        word_count_dict[word]["total"] += 1        
for k in word_count_dict.keys():
    for filename in word_count_dict[k]:
        if filename == 'total': continue
        counter.update(filename)

for k in word_count_dict.keys():
    for count in counter.most_common():
        print('{}  {}'.format(word_count_dict[k],count))

我如何仅在那些是该键值的字典元素的文件中找到英国的频率？

对于同一示例，我需要将这些值存储在另一个 d2 中，d2 必须包含

(英国,26,1200) (西班牙,52,6795) (法国,45,568)

其中 26 是 5.txt、76.txt 和 289.txt 文件中“英国”一词的频率，1200 是“英国”一词在所有文件中的频率。西班牙和法国也是如此。

我在这里使用计数器，我认为这是缺陷，因为到目前为止一切正常，除了我的最后一个循环！

我是一个python新手，我尝试过的很少！请帮忙！！

score 0 · Accepted Answer

word_count_dict["Britain"]是一本普通的字典。只需循环它：

for filename in word_count_dict["Britain"]:
    if filename == 'total': continue
    print("Britain appears in {} {} times".format(filename, word_count_dict["Britain"][filename]))

或检索所有密钥：

word_count_dict["Britain"].keys()

请注意，您total在该字典中有一个特殊键。

可能是您的缩进已关闭，但您似乎没有正确计算文件条目：

if file not in word_count_dict[word]:
    word_count_dict[word][file] = 0
    word_count_dict[word][file] += 1              
    word_count_dict[word]["total"] += 1

如果以前没有在每个单词词典中看到过，只会计算 ( += 1)个单词；file更正为：

if file not in word_count_dict[word]:
    word_count_dict[word][file] = 0
word_count_dict[word][file] += 1              
word_count_dict[word]["total"] += 1

要将其扩展为任意单词，请遍历外部word_count_dict：

for word, counts in word_count_dict.iteritems():
    print('Total counts for word {}: '.format(word, counts['total']))
    for filename, count in counts.iteritems():
        if filename == 'total': continue
        print("{} appears in {} {} times".format(word, filename, count))

python - 如何遍历字典python中的所有键？

1 回答 1

Related

Reference