1

我在 python 中创建了一个简单的字数统计程序,它读取一个文本文件,计算字频并将结果写入另一个文件。问题是当单词重复时,程序会写入同一个单词的初始计数和最终计数。例如,如果一个单词“hello”重复说 3 次,程序会在输出中写入 3 个 hello 实例:

词 - 频率计数

你好 - 1

你好 - 2

你好 - 3

代码是:

counts ={}
for w in words:
 counts[w] = counts.get(w,0) + 1
 outfile.write(w+','+str(counts[w])+'\n')'

任何帮助,将不胜感激。我对python非常陌生。

4

3 回答 3

5

The actual way to solve this is to use Counter, like this:

>>> from collections import Counter
>>> words = ['b','b','the','the','the','c']
>>> Counter(words).most_common()
[('the', 3), ('b', 2), ('c', 1)]

The other way to solve it, is by using a defaultdict, which will work just like the Counter example above:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for word in words:
...    d[word] += 1
...
>>> d
defaultdict(<type 'int'>, {'the': 3, 'b': 2, 'c': 1})

No matter how you count the words, you can only write to the file once all words are counted; otherwise you are writing once for each "count", and as soon as the word appears more than once, you will have doubled out your output.

So, first collect the counts, then write them out.

于 2014-02-19T07:21:08.103 回答
1

使代码工作的方法:

counts ={}
for w in words:
    counts[w] = counts.get(w,0) + 1

for w in counts:
    outfile.write(w+','+str(counts[w])+'\n')

但我认为 Burhan Khalid 建议使用 Counter 是解决问题的更好方法。

于 2014-02-19T07:26:44.853 回答
0

您是否考虑过首先将频率计数存储在程序中,然后在最后全部写入?它肯定比为每个计数重写输出文件更简单。

于 2014-02-19T07:16:19.333 回答