0

我想根据 variableV的 bin创建一个 variable 的直方图X。为此,我阅读了一个如下所示的 Excel 文件:

Column X   Column V
99.9       0
100.0      3
25.17      2
39.45      1
66.52      1
17.17      6
9.25       2
86.11      3
84.09      3

对于变量的每个 bin,X我想计算V与其关联的值的平均值。例如:

X bin: 0-30 -> avg(V)=(2+6+2)/3=3.33
X bin: 31-80 -> avg(V)=(1+1)/2=1.00
X bin: 81-100 -> avg(V)=(3+3+0+3)/4=2.25

所以我想出了:

X bin    avg(V)
0-30     3.33
31-80    1.00
81-100   2.25

为此,我编写了以下代码块,在其中我使用一些列表来收集V属于每个Xbin (binwidth=10) 内的所有值。

编辑

我的列表长度有问题。例如,对于 1000 行的 excel 文件,其中只有 1 个V值属于 bin 41-50。但是,如果我输入len(islands_4150)=999. 代码从哪里获得其他 998 个值?

from openpyxl import load_workbook
wb = load_workbook(filename = 'myfile.xlsx')
ws=wb.active
cell_range_1 = ws['X2':'X1001']
cell_range_2 = ws['V2':'V1001']
cf_list=[] #List with X values
island_list=[] #List with V values
for row in range(2,1001): 
    for column in 'X':
        cell_name_1="{}{}".format(column, row) #X
        cf_list.append(ws[cell_name_1].value)
        x=map(lambda x: int(x) if x%1==0 else x, cf_list)
        for column in 'V':
            cell_name_2="{}{}".format(column, row) #V
            island_list.append(ws[cell_name_2].value)
            v=map(lambda x: int(x) if x%1==0 else x, island_list)
islands_010=[] #List with values from column V which corresponding values from column X are 0<=value<=10
islands_1120=[]
islands_2130=[]
islands_3140=[]
islands_4150=[]
islands_5160=[]
islands_6170=[]
islands_7180=[]
islands_8190=[]
islands_91100=[]
for i, val in enumerate(x):
    for j, elem in enumerate(v):
        if x[i]>=0 and x[i]<=10:
            islands_010.append(v[i])
        elif x[i]>=11 and x[i]<=20:
            islands_1120.append(v[i])
        elif x[i]>=21 and x[i]<=30:
            islands_2130.append(v[i])
        elif x[i]>=31 and x[i]<=40:
            islands_3140.append(v[i])
        elif x[i]>=41 and x[i]<=50:
            islands_4150.append(v[i])
        elif x[i]>=51 and x[i]<=60:
            islands_5160.append(v[i])
        elif x[i]>=61 and x[i]<=70:
            islands_6170.append(v[i])
        elif x[i]>=71 and x[i]<=80:
            islands_7180.append(v[i])
        elif x[i]>=81 and x[i]<=90:
            islands_8190.append(v[i])
        elif x[i]>=91 and x[i]<=100:
            islands_91100.append(v[i])

if len(islands_010)==0:
    print ('Avg islands 0-10: 0') 
else:
    avg010=round(reduce(lambda x, y: x + y, islands_010) / len(islands_010),3)
    print ('Avg islands 0-10: '+str(avg010))

if len(islands_1120)==0:
    print ('Avg islands 11-20: 0') 
else:
    avg1120=round(reduce(lambda x, y: x + y, islands_1120) / len(islands_1120),3)
    print ('Avg islands 11-20: '+str(avg1120))

if len(islands_2130)==0:
    print ('Avg islands 21-30: 0')
else:
    avg2130=round(reduce(lambda x, y: x + y, islands_2130) / len(islands_2130),3)
    print ('Avg islands 21-30: '+str(avg2130))

if len(islands_3140)==0:
    print ('Avg islands 31-40: 0')
else:
    avg3140=round(reduce(lambda x, y: x + y, islands_3140) / len(islands_3140),3)
    print ('Avg islands 31-40: '+str(avg3140))

if len(islands_4150)==0:
    print ('Avg islands 41-50: 0')
else:
    avg4150=round(reduce(lambda x, y: x + y, islands_4150) / len(islands_4150),3)
    print ('Avg islands 41-50: '+str(avg4150))

if len(islands_5160)==0:
    print ('Avg islands 51-60: 0')
else:
    avg5160=round(reduce(lambda x, y: x + y, islands_5160) / len(islands_5160),3)
    print ('Avg islands 51-60: '+str(avg5160))

if len(islands_6170)==0:
    print ('Avg islands 61-70: 0')
else:
    avg6170=round(reduce(lambda x, y: x + y, islands_6170) / len(islands_6170),3)
    print ('Avg islands 61-70: '+str(avg6170))

if len(islands_7180)==0:
    print ('Avg islands 71-80: 0')
else:
    avg7180=round(reduce(lambda x, y: x + y, islands_7180) / len(islands_7180),3)
    print ('Avg islands 71-80: '+str(avg7180))

if len(islands_8190)==0:
    print ('Avg islands 81-90: 0')
else:
    avg8190=round(reduce(lambda x, y: x + y, islands_8190) / len(islands_8190),3)
    print ('Avg islands 81-90: '+str(avg8190))

if len(islands_91100)==0:
    print ('Avg islands 91-100: 0')
else:
    avg91100=round(reduce(lambda x, y: x + y, islands_91100) / len(islands_91100),3)
    print ('Avg islands 91-100: '+str(avg91100))
4

1 回答 1

2

就目前而言,您的代码结构相当糟糕,这使问题变得模糊不清。

第一个问题是空格。你需要一些。

接下来是带有线条的for column in 'X':for column in 'V':。这两个for循环没有用,可以替换为:

cell_name_1="X{}".format(row) #X variable
cell_name_2="V{}".format(row) #V variable

此外,我建议获取单元格值,然后进行所有比较:

x_val =  float(ws[cell_name_1].value)
v_val =  int(ws[cell_name_2].value)

python 中的范围在第一个数字上包含在内,在最后一个数字上不包含在内。因此,您的第一个循环中的范围应为 1002,以便最后一行为 1001。

for row in range(2, 1002):

我建议使用ws = WB.get_sheet_by_name("sheet_name")来检索工作表,而不是ws=wb.active确保您始终获得所需的工作表。

最后,我们来解决实际问题。您当前的方法是直接从 excel 直接读取到 bin 中。您应该做的是从 excel 中读取所有数据,然后对其进行操作以生成您想要的 bin。第一步是将数据放入一个 python 结构中,这会让你的生活变得最简单,我推荐一个元组列表:

islands.append((x_val,v_val))

这将产生如下内容:

[(99.9, 0), (100.0, 3), (25.17, 2), (39.45, 1), (66.52, 1), (17.17, 6), (9.25, 2), (86.11, 3), (84.09, 3)]

现在我们应该按列 X 值对数据进行排序:

islands.sort(key = lambda x: x[0])

生产:

[(9.25, 2), (17.17, 6), (25.17, 2), (39.45, 1), (66.52, 1), (84.09, 3), (86.11, 3), (99.9, 0), (100.0, 3)]

现在我们的数据已经排序,我们可以轻松地生成一个由每个 bin 的最大值定义的值字典:

bins = [30, 80, 100]
binned_data = {key: [] for key in bins}
for item in islands:
    for bin in bins:
        if item[0] <= bin:
            binned_data[bin].apppend(item[1])
            break

这会产生一个像这样的字典:

{80: [1, 1], 100: [0, 3, 3, 3], 30: [2, 6, 2]}

从这里你可以简单地计算平均值

averages = {bin: sum(binned_data[bin])/float(len(binned_data[bin])) for bin in binned_data}

把它们放在一起:

from openpyxl import load_workbook

wb = load_workbook(filename = 'myfile.xlsx')
ws = wb.get_sheet_by_name("sheet_name")

islands = []

for row in range(2,1002):
    cell_name_1="X{}".format(row) #X variable
    cell_name_2="V{}".format(row) #V variable

    x_val =  float(ws[cell_name_1].value)
    v_val =  int(ws[cell_name_2].value)

    islands.append((x_val,v_val))

islands.sort(key = lambda x: x[0])

bins = [30, 80, 100]
binned_data = {key: [] for key in bins}

for item in islands:
    for bin in bins:
        if item[0] <= bin:
            binned_data[bin].apppend(item[1])
            break

averages = {bin: sum(binned_data[bin])/float(len(binned_data[bin])) for bin in binned_data}
于 2016-04-05T17:03:28.883 回答