我看过一堆基本上做我正在做的事情的帖子,但不幸的是,我不确定为什么我不断得到不是我想要的输出。问题是,每当某个单词出现在我的 excel 文件中时,我都会尝试增加字典,但是每个单词实例都被视为我的代码当前的新单词。例如,“the”在我的文件中出现了约 50 次,但输出只是在许多不同的行上列出了“the”,每个实例的计数为“1”。实际上,我希望“the”被列出一次,计数为“50”。非常感谢任何澄清!这是我的代码:
import csv
import string
filename = "input.csv"
output = "output1.txt"
def add_word(counts, word):
word = word.lower()
#the problem is here, the following line never runs
if counts.has_key(word):
counts[word] +=1
#instead, we always go to the else statement...
else:
counts[word] = 1
return counts
def count_words(text):
word = text.lower()
counts = {}
add_word(counts, word)
return counts
def main():
infile = open(filename, "r")
input_fields = ('name', 'country')
reader = csv.DictReader(infile, fieldnames = input_fields)
next(reader)
first_row = next(reader)
outfile = open(output, "w")
outfile.write("%-18s%s\n" %("Word", "Count"))
for next_row in reader:
full_name = first_row['name']
word = text.split(' ',1)[0]
counts = count_words(word)
counts_list = counts.items()
counts_list.sort()
for word in counts_list:
outfile.write("%-18s%d\n" %(word[0], word[1]))
first_row = next_row
if __name__=="__main__":
main()