python-2.7 - 对 12 个千兆文本文件的出现次数进行排序和计数

Question

我想通过计算每一行的出现来对一个大文本文件（大约 12 Giga）进行排序。为此，我使用过：

sort file.txt | uniq -c > sorted

但由于几何膨胀，它需要永远。任何的想法？

score 0 · Accepted Answer

from collections import defaultdict

d = defaultdict(int)

with open(file.txt) as f:
    for line in f:
        d[line]+=1

d 现在包含一个字典，其中每个键是唯一的行，值是该行的计数。

1 回答 1