我有一个字典'd'中多个文本文件的地址列表:
'd:/individual-articles/9.txt', 'd:/individual-articles/11.txt', 'd:/individual-articles/12.txt',...
等等...
现在,我需要阅读字典中的每个文件,并保留整个字典中出现的每个单词的单词出现列表。
我的输出应该是以下形式:
the-500
a-78
in-56
等等..
其中 500 是单词“the”在字典中所有文件中出现的次数..等等..
我需要对所有单词都这样做。
我是一个python新手..请帮助!
我下面的代码不起作用,它没有显示输出!我的逻辑一定有错误,请更正!
import collections
import itertools
import os
from glob import glob
from collections import Counter
folderpaths='d:/individual-articles'
counter=Counter()
filepaths = glob(os.path.join(folderpaths,'*.txt'))
folderpath='d:/individual-articles/'
# i am creating my dictionary here, can be ignored
d = collections.defaultdict(list)
with open('topics.txt') as f:
for line in f:
value, *keys = line.strip().split('~')
for key in filter(None, keys):
if key=='earn':
d[key].append(folderpath+value+".txt")
for key, value in d.items() :
print(value)
word_count_dict={}
for file in d.values():
with open(file,"r") as f:
words = re.findall(r'\w+', f.read().lower())
counter = counter + Counter(words)
for word in words:
word_count_dict[word].append(counter)
for word, counts in word_count_dict.values():
print(word, counts)