8

我使用此脚本的目标是: 1.从 excel 文件(>100,000k 行)以及标题(标签、单位)中读取 timseries 数据 2.将 excel 数字日期转换为 pandas dataFrame 的最佳日期时间对象 3.能够使用时间戳来引用行和系列标签来引用列

到目前为止,我使用 xlrd 将 excel 数据读入列表。用每个列表制作熊猫系列,并使用时间列表作为索引。将系列与系列标题相结合以制作 python 字典。将字典传递给 pandas DataFrame。尽管我努力,df.index 似乎设置为列标题,我不确定何时将日期转换为 datetime 对象。

我 3 天前才开始使用 python,所以任何建议都会很棒!这是我的代码:

    #Open excel workbook and first sheet
    wb = xlrd.open_workbook("C:\GreenCSV\Calgary\CWater.xlsx")
    sh = wb.sheet_by_index(0)

    #Read rows containing labels and units
    Labels = sh.row_values(1, start_colx=0, end_colx=None)
    Units = sh.row_values(2, start_colx=0, end_colx=None)

    #Initialize list to hold data
    Data = [None] * (sh.ncols)

    #read column by column and store in list
    for colnum in range(sh.ncols):
        Data[colnum] = sh.col_values(colnum, start_rowx=5, end_rowx=None)

    #Delete unecessary rows and columns
    del Labels[3],Labels[0:2], Units[3], Units[0:2], Data[3], Data[0:2]   

    #Create Pandas Series
    s = [None] * (sh.ncols - 4)
    for colnum in range(sh.ncols - 4):
        s[colnum] = Series(Data[colnum+1], index=Data[0])

    #Create Dictionary of Series
    dictionary = {}
    for i in range(sh.ncols-4):
        dictionary[i]= {Labels[i] : s[i]}

    #Pass Dictionary to Pandas DataFrame
    df = pd.DataFrame.from_dict(dictionary)
4

1 回答 1

12

You can use pandas directly here, I usually like to create a dictionary of DataFrames (with keys being the sheet name):

In [11]: xl = pd.ExcelFile("C:\GreenCSV\Calgary\CWater.xlsx")

In [12]: xl.sheet_names  # in your example it may be different
Out[12]: [u'Sheet1', u'Sheet2', u'Sheet3']

In [13]: dfs = {sheet: xl.parse(sheet) for sheet in xl.sheet_names}

In [14]: dfs['Sheet1'] # access DataFrame by sheet name

You can check out the docs on the parse which offers some more options (for example skiprows), and these allows you to parse individual sheets with much more control...

于 2013-07-18T10:29:55.110 回答