3

I have a file with data like:

  Entry   Freq.
    2     4.5
    3     3.4
    5     4.9
    8     9.1
    12    11.1
    16    13.1
    18    12.2
    22    11.2

now the problem I am trying to solve is: I want to make it a grouped data (with range 10) based on the Entry and want to add up the frequencies falling within the range. e.g. for above table if I group it then it should be like:

    Range   SumFreq.
     0-10    21.9(i.e. 4.5 + 3.4 + 4.9 + 9.1)
     11-20   36.4

I reached upto column separation with following code but can't be able to perform range separation thing: my code is:

inp = ("c:/usr/ovisek/desktop/file.txt",'r').read().strip().split('\n')
for line in map(str.split,inp):
    k = int(line[0])
    l = float(line[-1])

so far is fine but how could I be able to group the data in 10 range.

4

4 回答 4

3

一种方法是 [ab] 使用整数除法将为您提供正确的 bin 的事实:

import collections
bin_size = 10
d = collections.defaultdict(float)
for line in map(str.split,inp):
    k = int(line[0])
    l = float(line[-1])
    d[bin_size * (k // bin_size)] += l
于 2012-05-31T08:25:10.237 回答
0

怎么样,只是在那里添加你的代码:

def group_data(range):
    grouped_data = {}
    inp = ("c:/usr/ovisek/desktop/file.txt",'r').read().strip().split('\n')
    for line in map(str.split,inp):
        k = int(line[0])
        l = float(line[-1])
        range_value = k // range
        if grouped_data.has_key(range_value):
            grouped_data[range_value]['freq'] = groped_data[range_value]['freq'] + l
        else:
            grouped_data[range_value] = {'freq':l, 'value':[str(range_value * range) + ':' + str((range_value + 1) * range )]}
    return grouped_data

这应该给你一个字典,如:

{1 : {'value':'0-10', 'freq':21.9} , .... }
于 2012-05-31T08:25:53.007 回答
0

这应该让你开始,测试很好:

inp = open("/tmp/input.txt",'r').read().strip().split('\n')
interval = 10
index = 0
resultDict = {}
for line in map(str.split,inp):
        k = int(line[0])
        l = float(line[-1])
        rangeNum = (int)  ((k-1)/10 )
        rangeKeyName = str(rangeNum*10+1)+"-"+str((rangeNum+1)*10)
        if(rangeKeyName in resultDict):
                resultDict[rangeKeyName] += l
        else:
                resultDict[rangeKeyName] = l

print(str(resultDict))

会输出:

{'21-30': 11.199999999999999, '11-20': 36.399999999999999, '1-10': 21.899999999999999}
于 2012-05-31T08:34:15.660 回答
-1

你可以这样做:

fr = {}
inp = open("file.txt",'r').read().strip().split('\n')
for line in map(str.split,inp):
    k = int(line[0])
    l = float(line[-1])
    key = abs(k-1) / 10 * 10

    if fr.has_key(key):
        fr[key] += l
    else:
        fr[key] = l

for k in sorted(fr.keys()):
    sum = fr[k]
    print '%d-%d\t%f' % (k+1 if k else 0, k+10, sum) 

输出:

0-10    21.900000
11-20   36.400000
21-30   11.200000
于 2012-05-31T08:57:02.150 回答