我有一个看起来像这样的文件:
732772 scaffold-3 G G A
732772 scaffold-2 G G A
742825 scaffold-3 A A G
776546 scaffold-3 G A G
776546 scaffold-6 G A G
我有兴趣使用第 2 列作为我的键,并以以下方式输出:具有唯一键,并与它的值相关联。
换句话说,如果第 2 列中的 name 出现多次,则仅输出一次,因此输出应为:
scaffold-3
732772 G G A
742825 A A G
776546 G A G
scaffold-2
732772 G G A
scaffold-6
776546 G A G
我是这样写的:
res = open('00test','r')
out = open('00testresult','w')
d = {}
for line in res:
if not line.startswith('#'):
line = line.strip().split()
pos = line[0]
name = line[1]
call = line[2]
father = line[3]
mother = line[4]
if not (name in d):
d[name] = []
d[name].append({'pos':pos,'call':call,'father':father,'mother':mother})
但我不知道如何以我上面描述的方式输出它。
任何帮助都会很好
编辑:
这是完全有效的代码,解决了问题:
res = open('00test','r')
out = open('00testresult','w')
d = {}
for line in res:
if not line.startswith('#'):
line = line.strip().split()
pos = line[0]
name = line[1]
call = line[2]
father = line[3]
mother = line[4]
if not (name in d):
d[name] = []
d[name].append({'pos':pos,'call':call,'father':father,'mother':mother})
for k,v in d.items():
out.write(str(k)+'\n')
for i in v:
out.write(str(i['pos'])+'\t'+str(i['call'])+'\t'+str(i['father'])+'\t'+str(i['mother'])+'\n')
out.close()