Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
我想通过计算每一行的出现来对一个大文本文件(大约 12 Giga)进行排序。为此,我使用过:
sort file.txt | uniq -c > sorted
但由于几何膨胀,它需要永远。任何的想法?
from collections import defaultdict d = defaultdict(int) with open(file.txt) as f: for line in f: d[line]+=1
d 现在包含一个字典,其中每个键是唯一的行,值是该行的计数。