pandas - 将 Pytables 表转换为 pandas DataFrame

Question

很多关于如何将 csv 读入 pandas 数据框的信息，但我拥有的是一个 pyTable 表并想要一个 pandas 数据框。

我已经找到了如何将我的 pandas DataFrame 存储到pytables ......然后读我想读回它，此时它将具有：

"kind = v._v_attrs.pandas_type"

我可以把它写成 csv 并重新读入，但这似乎很愚蠢。这就是我现在正在做的事情。

我应该如何将 pytable 对象读入熊猫？

score 7 · Accepted Answer

import tables as pt
import pandas as pd
import numpy as np

# the content is junk but we don't care
grades = np.empty((10,2), dtype=(('name', 'S20'), ('grade', 'u2')))

# write to a PyTables table
handle = pt.openFile('/tmp/test_pandas.h5', 'w')
handle.createTable('/', 'grades', grades)
print handle.root.grades[:].dtype # it is a structured array

# load back as a DataFrame and check types
df = pd.DataFrame.from_records(handle.root.grades[:])
df.dtypes

请注意，您的 u2（无符号 2 字节整数）将以 i8（整数 8 字节）结尾，并且字符串将是对象，因为 Pandas 尚不支持可用于 Numpy 数组的全部 dtype。

score 5 · Accepted Answer

文档现在包含一个关于使用 HDF5 存储的优秀部分，并且在食谱中讨论了一些更高级的策略。

现在相对简单：

In [1]: store = HDFStore('store.h5')

In [2]: print store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
Empty

In [3]: df = DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [4]: store['df'] = df

In [5]: store
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[2,2])

并从 HDF5/pytables 中检索：

In [6]: store['df']  # store.get('df') is an equivalent
Out[6]:
   A  B
0  1  2
1  3  4

也可以在表内查询。

pandas - 将 Pytables 表转换为 pandas DataFrame

2 回答 2

Related

Reference