0

我从使用列表到使用字典。然而,经过多次试验,我对我的代码有点困惑,我决定在这里发布一个问题。

我有一个看起来像这样的文件:[文件名 x50]:

10403   D   chr1    38
10616   D   chr1    22
14465   I   chr1    17
15171   I   chr1    17
16586   I   chr1    17
17237   I   chr1    17
26352   I   chr1    17
26752   I   chr1    17
27669   I   chr1    17
29097   I   chr1    23
29171   I   chr1    17
29902   D   chr1    11
32023   I   chr1    65

和一个文件(分子):

     16 13829325    I   chrY    10
     13 49157314    I   chr4    10
      8 13867496    I   chrY    16
      7 49102622    I   chr4    10
      7 224204410   I   chr1    10
      7 10800122    I   chr21   15
      6 49157651    I   chr4    10
      6 33866712    I   chr16   10
      6 224204343   I   chr1    10
      6 13869965    I   chrY    10
      6 13838848    I   chrY    15
      6 13829325    I   chrY    15
      5 89876424    I   chr2    10
      5 61758798    I   chr7    23
      5 58977653    I   chrY    10
      5 49157099    I   chr4    10
      5 38806712    D   chr10   10
      5 29829794    I   chr20   10

我的代码看起来像这样:

def round_down(num):
    return num - (num%100000)


x50 = open('path','r')
moleculo = open('path','r')
out = open('path','w')


x50_region = {}
ct=0
for region in x50:
    #print region

    (pos,types,chrm,leng) = region.strip().split()

    start = int(pos)-20
    end = int(pos)+20
    rounded_start=round_down(start)

    if not (chrm in x50_region):
        x50_region[chrm]={}
    if not (types in x50_region[chrm]):
        x50_region[chrm][types]={}
    if not (rounded_start in x50_region[chrm][types]):
        x50_region[chrm][types][rounded_start]= []

    x50_region[chrm][types][rounded_start].append({'start':start,'end':end})


c=0
for moleculo_line in moleculo:
    if (c % 1000 == 0):print "c ",c
    c=c+1
    moleculo_data = moleculo_line.strip().split()
    moleculo_chrom = moleculo_data[3]
    moleculo_types = moleculo_data[2]
    moleculo_len = int(moleculo_data[4])
    moleculo_start_pos = int(moleculo_data[1])
    moleculo_end_pos = int(moleculo_data[1]) + int(moleculo_data[4])
    rounded_moleculo_start_position = round_down(moleculo_start_pos)
    rounded_moleculo_end_position = round_down(moleculo_end_pos)
    moleculo_len = int(moleculo_data[4])

    if moleculo_chrom in x50_region and rounded_moleculo_start_position in x50_region[moleculo_chrom][moleculo_types]:
        for region in x50_region[moleculo_chrom][moleculo_types][rounded_moleculo_start_position]:
            #print i,moleculo_len
            if region['start'] <= moleculo_start_pos and moleculo_start_pos <= region['end']:
                out.write(str(moleculo_line))
out.close()

这个想法是过滤文件。

现在我想添加另一个条件。我有一个名为moleculo_len 的变量。它在第二个文件中包含我的事件的长度。x50 文件中与此对应的值称为 leng (pos,types,chrm,leng) = region.strip().split()

我应该如何在我的过滤中添加另一个步骤,以便当 Molecolo_len 处于 >= 50% 的长度值时写入该行?

我希望问题说清楚

4

0 回答 0