0

So I'm trying to generate a frequency spectrum, and in order to do that I'm trying to sort my data into bins from a column, which I have appended data from, called minorallelefreq. For some reason, the code is not working because I get a value of 0 for all 5 bins.

Here's the code:

minprop = 0
minprop1 = 0
minprop2 = 0
minprop3 = 0
minprop4 = 0 

for x in range(1,100):

    if minorallelefreq[x] <= 0.1:
        minprop = minprop + 1 
    if minorallelefreq[x] > 0.1 and minorallelefreq[x] <= 0.2: 
        minprop1 = minprop1 + 1 
    if minorallelefreq[x] > 0.2 and minorallelefreq[x] <= 0.3: 
        minprop2 = minprop2 + 1
    if minorallelefreq[x] > 0.3 and minorallelefreq[x] <= 0.4: 
        minprop3 = minprop3 + 1
    if minorallelefreq[x] > 0.4 and minorallelefreq[x] <= 0.5:
        minprop4 = minprop4 + 1



bin1 = minprop/float(counter)
bin2 = minprop1/float(counter)
bin3 = minprop2/float(counter)
bin4 = minprop3/float(counter)
bin5 = minprop4/float(counter)  
print "Bin1 (0-0.1)=", bin1, "\t", "Bin2 (0.1-0.2)=", bin2, "\t", "Bin3 (0.2-0.3)=", bin3, "\t", "Bin4 (0.3-0.4)=", bin4, "\t", "Bin5 (0.4-0.5)=", bin5

So it turned out that the reason the loops weren't working is because python wasn't reading my values (which are all decimals) as decimals. So, I had to change it to float(minorallelefreq[x]), and it worked.

4

2 回答 2

4

此代码中有几个可能的错误

  1. int(0.1) => 0,所以 minprop 将始终为 0,除非有负值
  2. minprop4 没有缩进,并且永远不会设置,因为该值不能同时为 > 0.4 和 <= 0.4
  3. 你假设有 100 个元素,并且它们都在 0 到 0.4 之间

我建议您尝试根据实际值而不是预期值自动分桶:

import collections
buckets = collections.Counter()

for value in minorallfreq:
    bucket = int(value * 10) / 10.0
    buckets[bucket] += 1
于 2013-10-25T21:28:24.483 回答
2
if minorallelefreq[x] <= int(0.1):
    minprop = minprop + 1 
if minorallelefreq[x] > 0.1 and minorallelefreq[x] <= 0.2: 
    minprop1 = minprop1 + 1 

means that there will be no action taken for values > 0 and <= 0.1. What happens if you remove the int() call?

Also, nothing will happen for values > 0.4:

if minorallelefreq[x] > 0.4 and minorallelefreq[x] <= 0.4:  # never True
    minprop4 = minprop4 + 1                                 # indentation??

Generally, you should look into chained comparisons:

if 0.1 < minorallelefreq[x] <= 0.2:   # much easier to read, and faster.
    # etc.
于 2013-10-25T21:25:03.850 回答