Python 有一个很好的内置类:collections.Counter
.
In [8]: from collections import Counter
In [9]: with open('Makefile', 'r') as f:
...: raw = Counter(f.read())
...:
In [10]: raw
Out[10]: Counter({' ': 61, 'e': 46, 'p': 38, 'a': 29, '\n': 27, 'c': 27, 'n': 27, 'l': 26, 'd': 25, '-': 22, 's': 22, 'y': 22, 't': 20, 'i': 18, 'o': 18, 'r': 17, '.': 16, 'u': 13, '\t': 12, 'm': 12, 'b': 11, 'x': 10, 'h': 9, '/': 8, ':': 8, '_': 7, "'": 6, ';': 5, '\\': 5, 'f': 5, '*': 3, 'v': 3, '{': 3, '}': 3, 'k': 2, 'H': 1, 'O': 1, 'N': 1, 'P': 1, 'Y': 1, 'g': 1})
这是来自pandas
图书馆的Makefile
,顺便说一句。要按频率降序对它们进行排序,请执行以下操作:
In [22]: raw.most_common()
Out[22]:
[(' ', 61),
('e', 46),
('p', 38),
('a', 29),
('\n', 27),
('c', 27),
('n', 27),
('l', 26),
('d', 25),
('-', 22),
('s', 22),
('y', 22),
('t', 20),
('i', 18),
('o', 18),
('r', 17),
('.', 16),
('u', 13),
('\t', 12),
('m', 12),
('b', 11),
('x', 10),
('h', 9),
('/', 8),
(':', 8),
('_', 7),
("'", 6),
(';', 5),
('\\', 5),
('f', 5),
('*', 3),
('v', 3),
('{', 3),
('}', 3),
('k', 2),
('H', 1),
('O', 1),
('N', 1),
('P', 1),
('Y', 1),
('g', 1)]
我故意不使用您的确切数据,以便您可以尝试使我的解决方案适应您的问题。