如果顺序无关紧要
如果您真的想在 python 中执行此操作(与sort filepath | uniq -c
Jean 建议的相反),那么我会这样做:
import collections
with open('path/to/file') as f:
counts = collections.Counter(f)
outfile = open('path/to/outfile', 'w')
for line,occ in counts.iteritems():
outfile.write("%s repeat %d\n" %(line.strip(), occ))
outfile.close()
如果订单很重要
如果顺序很重要(如果条目i
出现在j
输入文件中的条目之前,那么条目i
必须出现在j
输出文件中的条目之前),那么您需要的是修改后的行程编码器。但是请注意,如果您有以下输入文件:
v1
v1
v1
v2
v2
v3
v1
然后,您的输出文件将如下所示:
v1 repeat 3
v2 repeat 2
v3
v1
with open('infilepath') as infile:
outfile = open('outfilepath', 'w')
curr = infile.readline().strip()
count = 1
for line in infile:
if line.strip() == curr:
count += 1
else:
outfile.write(curr)
if count-1:
outfile.write(" repeat %d\n" %count)
else:
outfile.write("\n")
curr = line.strip()
count = 1
outfile.write(curr)
if count-1:
outfile.write(" repeat %d\n" %count)
outfile.close()
当然,uniq -c infilepath > outfilepath
也会这样做
希望这可以帮助