3

I have an HDF5 file containing pandas Series/DataFrame tables. I need to get (pandas) index of a table stored under a key in HDF, but not necessarily the whole table:

I can think of two (effectively the same) methods of getting the index:

import pandas as pd

hdfPath = 'c:/example.h5'
hdfKey = 'dfkey'
# way 1:
with pd.HDFStore(hdfPath) as hdf:
    index = hdf[hdfKey].index

# way 2:
index = pd.read_hdf(hdfPath, hdfKey)

However for a pandas Series of ~2000 rows this takes 0.6 sec:

%timeit pd.read_hdf(hdfPath, hdfKey).index
1 loops, best of 3: 605 ms per loop

Is there a way to get only index of a table in HDF?

4

1 回答 1

2

HDFStore 对象有一个select_column方法,可以让您获取索引。请注意,它将返回一个以索引为值的系列。

with pd.HDFStore(hdfPath) as hdf:
    index = hdf.select_column(hdfKey, 'index').values
于 2016-07-17T18:13:52.003 回答