3

我有一个看起来像这样的文件:

732772  scaffold-3  G   G   A
732772  scaffold-2  G   G   A
742825  scaffold-3  A   A   G
776546  scaffold-3  G   A   G
776546  scaffold-6  G   A   G

我有兴趣使用第 2 列作为我的键,并以以下方式输出:具有唯一键,并与它的值相关联。

换句话说,如果第 2 列中的 name 出现多次,则仅输出一次,因此输出应为:

scaffold-3
732772   G  G   A
742825   A  A   G
776546   G  A   G
scaffold-2
732772   G  G   A
scaffold-6
776546   G  A   G

我是这样写的:

res = open('00test','r')
out = open('00testresult','w')

d = {}
for line in res:
    if not line.startswith('#'):
        line = line.strip().split()
        pos = line[0]
        name = line[1]
        call = line[2]
        father = line[3]
        mother = line[4]

        if not (name in d):
            d[name] = []
        d[name].append({'pos':pos,'call':call,'father':father,'mother':mother})

但我不知道如何以我上面描述的方式输出它。

任何帮助都会很好

编辑:

这是完全有效的代码,解决了问题:

res = open('00test','r')
out = open('00testresult','w')

d = {}
for line in res:
    if not line.startswith('#'):
        line = line.strip().split()
        pos = line[0]
        name = line[1]
        call = line[2]
        father = line[3]
        mother = line[4]

        if not (name in d):
            d[name] = []
        d[name].append({'pos':pos,'call':call,'father':father,'mother':mother})

for k,v in d.items():
    out.write(str(k)+'\n')
    for i in v:
        out.write(str(i['pos'])+'\t'+str(i['call'])+'\t'+str(i['father'])+'\t'+str(i['mother'])+'\n')

out.close()
4

1 回答 1

2

现在你有了字典,循环遍历这些项目并写入一个文件:

keys = ('pos', 'call', 'father', 'mother')

with open(outputfilename, 'w') as output:
    for name in d:
        output.write(name + '\n')
        for entry in d['name']:
            output.write(' '.join([entry[k] for k in keys]) + '\n')

您可能希望使用collections.defaultdict()对象而不是常规字典d

from collections import defaultdict

d = defaultdict(list)

并完全删除这些if not (name in d): d[name] = []行。

于 2013-08-14T14:17:53.843 回答