3

我正在编写一个函数,该函数采用 in_file 并检查该文件中字母的频率并以这种格式(字母:频率)写入 out_file。这是我到目前为止所得到的任何人都可以帮忙吗?

def count_letters(in_file,out_file):
    in_file = open(in_file,"r")
    out_file = open(out_file,"w")
    for line in in_file:
        words = line.split()
        for word in words:
            for letter in word:
                print(letter,':',line.count(letter),file=out_file,end="\n")
4

2 回答 2

4

根本不需要拆分单词;直接将字符串传递给计数器会更新每个字符的计数。您还需要先收集所有计数,然后才将它们写入输出文件:

from collections import Counter

def count_letters(in_filename, out_filename):
    counts = Counter()
    with open(in_filename, "r") as in_file:
        for chunk in iter(lambda: in_file.read(8196), ''):
            counts.update(chunk)
    with open(out_filename, "w") as out_file:
        for letter, count in counts.iteritems():
            out_file.write('{}:{}\n'.format(letter, count)

请注意,输入文件是按 8kb 块处理的,而不是一次性处理的;您可以调整块大小(最好是 2 的幂)以最大化吞吐量。

如果您希望输出文件按频率(降序)排序,则可以使用.most_common()而不是此处。.iteritems()

于 2013-08-10T18:31:48.780 回答
0

这应该可以解决问题 - 它计算所有字符,而不仅仅是字母:

def count_letters(in_file,out_file):
    from collections import Counter
    in_file = open(in_file,"r")
    out_file = open(out_file,"w")
    letter_counts = Counter()
    with open(in_file, 'r') as in_file:
        for line in in_file:
            line = line.strip()
            for letter in line:
                # Count only letters.
                if not letter.isalpha():
                    continue
                letter_counts[letter] += 1

    with open(out_file, 'w') as out_file:
        for letter, count in letter_counts.iteritems():
            out_file.write('{}:{}\n'.format(letter, count))
于 2013-08-10T18:31:23.500 回答