1

好吧,我接受标题对我的问题含糊不清,我无法以更易于理解的方式表达。我是编程新手,我的技术术语仍在发展中。

我有两个文件,文件A看起来像:

CHROM   POS ID  AGM12   AGM14   AGM15   AGM18 ..
1   14930   rs150145850     0/0 1/1 0/0  0/0 ..
1   14933   rs138566748 0/0 0/0 0/0  0/0 ..
1   63671   rs116440577 0/1 0/0 0/0  0/0 ..
2   808922  rs6594027   0/0 0/0 0/0  0/1 ..
2   753474  rs2073814   1/0 0/0 0/1  0/0 ..
3   753405  rs61770173  0/0 1/1 0/0  1/0 ..
...
...
...

文件B看起来像:

CHROM   POS rsID    Sample_ID
1   14930   rs150145850 AGM15
2   808922  rs6594027   AGM18
3   753405  rs61770173  AGM12
...
...
...

我希望使用 File 中的 POS 字段信息(第 2 列)将 File中B相应的内容替换为.Sample_IDANA

例如:输出应该看起来像

CHROM   POS ID  AGM12   AGM14   AGM15   AGM18
1   14930   rs150145850     0/0 1/1 NA   0/0
1   14933   rs138566748 0/0 0/0 0/0  0/0
1   63671   rs116440577 0/1 0/0 0/0  0/0
2   808922  rs6594027   0/0 0/0 0/0  NA
2   753474  rs2073814   1/0 0/0 0/1  0/0
3   753405  rs61770173  NA  1/1 0/0  1/0

我怎么能在 Python 或 Unix 中做到这一点?

4

3 回答 3

1

这是使用该csv模块的版本(我假设您的列是制表符分隔的)。

import csv
import collections

a = 'path/to/a'
b = 'path/to/b'
output = 'output/path'

pos = collections.defaultdict(list)

with open(b) as csvin:
    reader = csv.DictReader(csvin, delimiter='\t')
    for line in reader:
        pos[line['POS']].append(line['Sample_ID'])

with open(a) as csvin, open(output, 'wb') as csvout:
    reader = csv.DictReader(csvin, delimiter='\t')
    writer = csv.DictWriter(csvout, fieldnames=reader.fieldnames, delimiter='\t')
    writer.writeheader()
    for line in reader:
        fields = pos.get(line['POS'], [])
        for field in fields:
            line[field] = 'NA'
        writer.writerow(line)
于 2012-11-22T15:13:36.330 回答
0

试试这个。

def method(file1, file2, fileout):
    d1, d2, headers = {}
    i = 1
    with open(file1) as f1:  
        for line in f1:
            vars = line.split('\t') #i am assuming tab seperated
            d1[vars[1]] = [vars[0]] + vars[2:]
    with open(file2) as f2:
        for line in f2:
            vars = line.split('\t')
            d2[vars[1]] = vars[2]
    for header in d1['POS']:
        headers[header] = i
        i+=1
    with open(fileout, 'w') as fo:
        fo.write("%s\tPOS\t%s\n" % (d1['POS'][0], "\t".join(d1['POS'][1:]))
        del d1['POS']         
        for key, values in d1.items():
            if key in d2:
                d1[key][headers[d2[key]]] = "NA"
            fo.write("%s\t%s\t%s\n" % (values[0], key, "\t".join(values[1:])))
于 2012-11-22T14:20:38.713 回答
0

如果您不介意安装一些软件包,您可以通过以下方式巧妙地做到这一点pandas

A = pandas.DataFrame.from_csv("A.txt", sep="\t", index_col=(0,1))
B = pandas.DataFrame.from_csv("B.txt", sep="\t", index_col=(0,1))

A.join(B) # the resulting dataset

当然,你必须拿起pandas才能做到这一点。

于 2012-11-22T14:29:18.433 回答