我从使用列表到使用字典。然而,经过多次试验,我对我的代码有点困惑,我决定在这里发布一个问题。
我有一个看起来像这样的文件:[文件名 x50]:
10403 D chr1 38
10616 D chr1 22
14465 I chr1 17
15171 I chr1 17
16586 I chr1 17
17237 I chr1 17
26352 I chr1 17
26752 I chr1 17
27669 I chr1 17
29097 I chr1 23
29171 I chr1 17
29902 D chr1 11
32023 I chr1 65
和一个文件(分子):
16 13829325 I chrY 10
13 49157314 I chr4 10
8 13867496 I chrY 16
7 49102622 I chr4 10
7 224204410 I chr1 10
7 10800122 I chr21 15
6 49157651 I chr4 10
6 33866712 I chr16 10
6 224204343 I chr1 10
6 13869965 I chrY 10
6 13838848 I chrY 15
6 13829325 I chrY 15
5 89876424 I chr2 10
5 61758798 I chr7 23
5 58977653 I chrY 10
5 49157099 I chr4 10
5 38806712 D chr10 10
5 29829794 I chr20 10
我的代码看起来像这样:
def round_down(num):
return num - (num%100000)
x50 = open('path','r')
moleculo = open('path','r')
out = open('path','w')
x50_region = {}
ct=0
for region in x50:
#print region
(pos,types,chrm,leng) = region.strip().split()
start = int(pos)-20
end = int(pos)+20
rounded_start=round_down(start)
if not (chrm in x50_region):
x50_region[chrm]={}
if not (types in x50_region[chrm]):
x50_region[chrm][types]={}
if not (rounded_start in x50_region[chrm][types]):
x50_region[chrm][types][rounded_start]= []
x50_region[chrm][types][rounded_start].append({'start':start,'end':end})
c=0
for moleculo_line in moleculo:
if (c % 1000 == 0):print "c ",c
c=c+1
moleculo_data = moleculo_line.strip().split()
moleculo_chrom = moleculo_data[3]
moleculo_types = moleculo_data[2]
moleculo_len = int(moleculo_data[4])
moleculo_start_pos = int(moleculo_data[1])
moleculo_end_pos = int(moleculo_data[1]) + int(moleculo_data[4])
rounded_moleculo_start_position = round_down(moleculo_start_pos)
rounded_moleculo_end_position = round_down(moleculo_end_pos)
moleculo_len = int(moleculo_data[4])
if moleculo_chrom in x50_region and rounded_moleculo_start_position in x50_region[moleculo_chrom][moleculo_types]:
for region in x50_region[moleculo_chrom][moleculo_types][rounded_moleculo_start_position]:
#print i,moleculo_len
if region['start'] <= moleculo_start_pos and moleculo_start_pos <= region['end']:
out.write(str(moleculo_line))
out.close()
这个想法是过滤文件。
现在我想添加另一个条件。我有一个名为moleculo_len 的变量。它在第二个文件中包含我的事件的长度。x50 文件中与此对应的值称为 leng (pos,types,chrm,leng) = region.strip().split()
我应该如何在我的过滤中添加另一个步骤,以便当 Molecolo_len 处于 >= 50% 的长度值时写入该行?
我希望问题说清楚