0

我为我的文件制作了一个嵌套字典,以对类中的事件进行分组。我想通过关键数字来计算我有多少类以及有多少最终值。这是我到目前为止的代码:

infile = open('ALL','r')

def round_down(num):
    return num - (num%100)

count = 0
a = []
split_region = {}
lengths = []
for region in infile:
    #print region

    (cov,chrm,pos,end,leng) = region.split()
    start = int(pos)#-1#-int(leng) ## loosen conditions about break points
    end = int(end)
    lengths = int(leng)
    coverage=int(cov)
    rounded_start=round_down(start)
    rounded_length=round_down(lengths)
    if not (chrm in split_region):
        split_region[chrm]={}
    if not (rounded_start in split_region[chrm]):
        split_region[chrm][rounded_start]={}
    if not (rounded_length in split_region[chrm][rounded_start]):
        split_region[chrm][rounded_start][rounded_length]= []
    split_region[chrm][rounded_start][rounded_length].append({'start':start,'length':lengths,'cov':coverage})

    for k,v in split_region[chrm][rounded_start].items():
        print len(v),k,v
        a.append(len(v))
        count +=1
print count
print sum(a)

文件格式如下:

5732    chrM    1   16572   16571
804 chr6    58773612    58780166    6554
722 chr1    142535435   142538993   3558
448 chrY    13447747    13451695    3948
372 chr9    68422753    68423813    1060
327 chr2    133017433   133018716   1283
302 chr18   107858  109884  2026
256 chr20   29638813    29641416    2603
206 chr6    57423087    57429121    6034
204 chr1    142537237   142538991   1754

所以它基本上是通过将数字四舍五入 100 并在我的字典中用它来分类的。它是嵌套的,因为我首先按四舍五入的开始分组,然后是四舍五入的长度变量。

在代码的末尾,我尝试计算有多少类,以及我的值的总数是多少。然而,这会给出不正确的输出:类比输入文件中的行多。任何想法如何解决这个问题?

4

1 回答 1

0

我不清楚您想要哪个总数,但也许以下之一是您正在寻找的:

rounded_start_count = 0
rounded_length_count = 0
rounded_length_value_count = 0

for k1, v1 in split_region.items():
    print k1 + ": " + str(len(v1))
    rounded_start_count += len(v1)
    for k2, v2 in v1.items():
        rounded_length_count += len(v2)
        rounded_length_value_count += len(v2.values())

print ""

print "chrm count:                 ", len(split_region.keys())
print "Rounded start count:        ", rounded_start_count
print "Rounded length count:       ", rounded_length_count
print "Rounded length value count: ", rounded_length_count

这将放置在您的 for 循环之后和之外。这将为您的示例数据打印以下输出:

chr6: 2
chr2: 1
chr1: 2
chr9: 1
chrY: 1
chr20: 1
chrM: 1
chr18: 1

chrm count:                  8
Rounded start count:         10
Rounded length count:        10
Rounded length value count:  10
于 2013-09-23T00:19:43.930 回答