1

我有另一个新手 python 问题。我有一个文件,如下所示。我需要将其转换为矢量和指纹状的形式。对我来说,问题是如何组合文件,所以在最后我有一个矩阵,其中行是 cmps,列是 val ...,如果 comp 缺少 val,则等于零。cmp 的值不同,重叠不是很大。你能建议去哪里更好吗?Python 字典?任何想法都有帮助。谢谢!

cmp1    0.277   val_1
cmp1    0.097   val_2
cmp1    0.795   val_3
cmp1    0.809   val_4
cmp1    0.127   val_5
cmp2    0.839   val_3
cmp2    0.909   val_4
cmp2    0.148   val_5
cmp2    0.938   val_6
cmp2    0.599   val_7

我需要收到的结果....

矢量版

name    val_1   val_2   val_3   val_4   val_5   val_6   val_7
cmp1    0.277   0.097   0.795   0.809   0.127   0   0
cmp2    0   0   0.839   0.909   0.148   0.938   0.599   

二进制版本

name    val_1   val_2   val_3   val_4   val_5   val_6   val_7
cmp1    0   0   1   1   0   0   0
cmp2    0   0   1   1   0   1   1

当前代码

import csv

fi = open("data.txt", "rb")
fo = open("data_out.txt", "wb")
reader = csv.reader(fi,delimiter='\t')
writer = csv.writer(fo,delimiter='\t')

# making unique lists
targets = set()
ligands = set()

for row in reader:
    ligands.add(row[0])
    targets.add(row[2])

data = []
for row in reader:
    if row[0] in ligands and row[2] in targets:
    else: 
4

1 回答 1

2

你可以collections.defaultdict在这里使用:

from collections import defaultdict
with open('abc') as f:
    dic = defaultdict(dict)
    for line in f:
        cmp, val, col = line.split()
        dic[cmp][col] = val
print dic
# defaultdict(<type 'dict'>,
 #{'cmp1': {'val_5': '0.127', 'val_4': '0.809', 'val_1': '0.277', 'val_3': '0.795', 'val_2': '0.097'},
 # 'cmp2': {'val_5': '0.148', 'val_4': '0.909', 'val_7': '0.599', 'val_6': '0.938', 'val_3': '0.839'}})

#get a sroted list of all val_i from the dic        
vals = sorted(set(y for x in dic.itervalues() for y in x))

keys = sorted(dic)
print "name    {}".format("\t".join(vals))
for key in keys:
    print "{}    {}".format(key, "\t".join(dic[key].get(v,'0')  for v in vals)  )

输出:

name    val_1   val_2   val_3   val_4   val_5   val_6   val_7
cmp1    0.277   0.097   0.795   0.809   0.127   0   0
cmp2    0   0   0.839   0.909   0.148   0.938   0.599

对于二进制版本,您可以尝试:

print "name    {}".format("\t".join(vals))
for key in keys:
    strs = "\t".join(str(int(round(float(dic[key][v])))) if v in dic[key] else '0'  for v in vals)
    print "{}    {}".format(key, strs )

输出:

name    val_1   val_2   val_3   val_4   val_5   val_6   val_7
cmp1    0   0   1   1   0   0   0
cmp2    0   0   1   1   0   1   1
于 2013-07-04T12:55:02.227 回答