0

我正在制作一个使用该xlrd库解析 Excel 文件的 Python 脚本。我想要的是if在单元格包含某个值的不同列上进行计算。否则,跳过这些值。然后将输出存储在字典中。这是我试图做的:

import xlrd


workbook = xlrd.open_workbook('filter_data.xlsx')
worksheet = workbook.sheet_by_name('filter_data')

num_rows = worksheet.nrows -1
num_cells = worksheet.ncols - 1

first_col = 0
scnd_col = 1
third_col = 2

# Read Data into double level dictionary
celldict = dict()
for curr_row in range(num_rows)  :

    cell0_val = int(worksheet.cell_value(curr_row+1,first_col))
    cell1_val = worksheet.cell_value(curr_row,scnd_col)
    cell2_val = worksheet.cell_value(curr_row,third_col)

    if cell1_val[:3] == 'BL1' :
        if cell2_val=='toSkip' :
        continue
    elif cell1_val[:3] == 'OUT' :
        if cell2_val == 'toSkip' :
        continue
    if not cell0_val in celldict :
        celldict[cell0_val] = dict()
# if the entry isn't in the second level dictionary then add it, with count 1
    if not cell1_val in celldict[cell0_val] :
        celldict[cell0_val][cell1_val] = 1
        # Otherwise increase the count
    else :
        celldict[cell0_val][cell1_val] += 1

所以在这里你可以看到,我计算了每个“cell0_val”的“cell1_val”值的数量。但我想跳过那些在相邻列的单元格中具有“toSkip”的值,然后再进行求和并将其存储在字典中。我在这里做错了,我觉得解决方案要简单得多。任何帮助,将不胜感激。谢谢。

这是我的工作表的一个例子:

cell0 cell1  cell2
12    BL1    toSkip
12    BL1    doNotSkip
12    OUT3   doNotSkip
12    OUT3   toSkip
13    BL1    doNotSkip
13    BL1    toSkip
13    OUT3   doNotSkip
4

2 回答 2

1

用于嵌套字典collections.defaultdictcollections.Counter

这是在行动:

>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> d['red']['blue'] += 1
>>> d['green']['brown'] += 1
>>> d['red']['blue'] += 1
>>> pprint.pprint(d)
{'green': Counter({'brown': 1}),
 'red': Counter({'blue': 2})}

在这里,它被集成到您的代码中:

from collections import defaultdict, Counter
import xlrd

workbook = xlrd.open_workbook('filter_data.xlsx')
worksheet = workbook.sheet_by_name('filter_data')

first_col = 0
scnd_col = 1
third_col = 2

celldict = defaultdict(Counter)
for curr_row in range(1, worksheet.nrows): # start at 1 skips header row

    cell0_val = int(worksheet.cell_value(curr_row, first_col))
    cell1_val = worksheet.cell_value(curr_row, scnd_col)
    cell2_val = worksheet.cell_value(curr_row, third_col)

    if cell2_val == 'toSkip' and cell1_val[:3] in ('BL1', 'OUT'):
        continue

    celldict[cell0_val][cell1_val] += 1

我还结合了您的 if 语句并将计算更改curr_row为更简单。

于 2012-12-06T15:10:21.473 回答
0

看起来你想在cell2_val等于时跳过当前行'toSkip',所以如果你if cell2_val=='toSkip' : continue在计算之后直接添加它会简化代码cell2_val

另外,你在哪里

# if the entry isn't in the second level dictionary then add it, with count 1
if not cell1_val in celldict[cell0_val] :
    celldict[cell0_val][cell1_val] = 1
    # Otherwise increase the count
else :
    celldict[cell0_val][cell1_val] += 1

通常的成语更像

celldict[cell0_val][cell1_val] = celldict[cell0_val].get(cell1_val, 0) + 1

也就是说,使用默认值 0,这样如果 keycell1_val还没有在 中celldict[cell0_val],那么get()将返回 0。

于 2012-12-06T14:58:14.150 回答