0

我正在编写一个解析 Excel 文件的 Python 脚本。此脚本的目的是计算第 1 列中的每个单元格值,即第 2 列中的值的数量。

例如,一个看起来像这样的 Excel 文件:

12    abc
12    abc
12    efg
12    efg
13    hij
13    hij
13    klm

我的脚本将返回:

For cell value 12 : 2 values "abc", 2 values "efg" and for cell value 13 : 2 values "hij" and 1 value "klm".

我尝试在 Python 中使用哈希,这就是我想要做的:

import xlrd
workbook = xlrd.open_workbook('myexcelfile.xls')
worksheet = workbook.sheet_by_name('myexcelsheet')
num_rows = worksheet.nrows - 1
num_cells = worksheet.ncols - 1
first_col = 0
scnd_col = 1
curr_row = 1
hash = []
while curr_row < num_rows:
curr_row += 1
curr_cell = -1
print 'IN ROW', curr_row
while curr_cell < num_cells:
        curr_cell += 1
        print 'IN CELL', curr_cell
        cell0_val = int(worksheet.cell_value(curr_row,first_col))
        cell1_val = worksheet.cell_value(curr_row,scnd_col)
        print 'CELL VALUE', cell0_val, cell1_val
        hash[cell0_val][cell1_val]+=1

我当然以错误的方式使用该哈希,但我真的是 Python 新手,我在网上找不到任何符合我真正想要的好的例子。任何帮助将不胜感激。谢谢

4

3 回答 3

1

你也可以这样做:

from itertools import groupby
from operator import itemgetter
from collections import Counter
import xlrd

workbook = xlrd.open_workbook('myexcelfile.xls')
sheet = workbook.sheet_by_name('myexcelsheet')

as_list = sorted([sheet.row_values(rownum) for rownum in range(sheet.nrows)],
                 key=itemgetter(1))

for cell_value, vals in groupby(as_list, itemgetter(0)):
    letter_values = [v[1] for v in vals]
    occurrences = dict(Counter(letter_values))

    print 'For cell value {}:'.format(int(cell_value))
    print ', '.join('{} values {}'.format(str(c), v) 
                    for v, c in occurrences.items())

并根据需要格式化输出。

于 2012-11-26T15:01:34.113 回答
0

你的意思是字典
也许在每个键中放一个列表。首先是hash = {}

如果只有两列,则不需要第二个循环。做这样的事情

cell0_val = int(worksheet.cell_value(curr_row,first_col))
cell1_val = worksheet.cell_value(curr_row,scnd_col)

if cell0_val in hash:
    hash[cell0_val].append(cell1_val)
else:
    hash[cell0_val] = [cell1_val]

你应该得到类似的东西hash= {12: ['abc', 'abc', 'efg', 'efg'], 13: ['hij', 'hij', 'klm']}

于 2012-11-26T14:31:22.560 回答
0

我会使用双层字典:

所以你的字典定义:

celldict = dict() # 或 celldict = {}

import xlrd
workbook = xlrd.open_workbook('myexcelfile.xls')
worksheet = workbook.sheet_by_name('myexcelsheet')

num_rows = worksheet.nrows - 1
num_cells = worksheet.ncols - 1

first_col = 0
scnd_col = 1


# Read Data into double level dictionary
celldict = dict()
for curr_row in range(num_rows)  :

    #print 'IN ROW',curr_row
    cell0_val = int(worksheet.cell_value(curr_row,first_col))
    cell1_val = worksheet.cell_value(curr_row,scnd_col)

    # if this cell number isn't in my cell dict add it
    if not cell0_val in celldict :

        celldict[cell0_val] = dict()

    # if the entry isn't in the second level dictionary then add it, with count 1

    if not cell1_val in celldict[cell0_val] :
        celldict[cell0_val][cell1_val] = 1

    # Otherwise increase the count
    else :
        celldict[cell0_val][cell1_val] += 1

# Outputs Dictionary hierachy
print   celldict
# Outputs it more pretiliy
for cellval in celldict :
    print "For cell value ", cellval  ,":"
    for cellval2 in celldict[cellval] :
        print cellval2," values", celldict[cellval][cellval2]
于 2012-11-26T14:54:18.400 回答