我认为标题涵盖了这个问题,但要阐明:
pandas python 包有一个 DataFrame 数据类型,用于在 python 中保存表数据。它还有一个方便的hdf5文件格式接口,因此可以使用简单的类似 dict 的接口来保存 pandas DataFrames(和其他数据)(假设您安装了pytables)
import pandas
import numpy
d = pandas.HDFStore('data.h5')
d['testdata'] = pandas.DataFrame({'N': numpy.random.randn(5)})
d.close()
到目前为止,一切都很好。但是,如果我尝试将相同的 hdf5 加载到 RI 中,就会发现事情并不那么简单:
> library(hdf5)
> hdf5load('data.h5')
NULL
> testdata
$block0_values
[,1] [,2] [,3] [,4] [,5]
[1,] 1.498147 0.8843877 -1.081656 0.08717049 -1.302641
attr(,"CLASS")
[1] "ARRAY"
attr(,"VERSION")
[1] "2.3"
attr(,"TITLE")
[1] ""
attr(,"FLAVOR")
[1] "numpy"
$block0_items
[1] "N"
attr(,"CLASS")
[1] "ARRAY"
attr(,"VERSION")
[1] "2.3"
attr(,"TITLE")
[1] ""
attr(,"FLAVOR")
[1] "numpy"
attr(,"kind")
[1] "string"
attr(,"name")
[1] "N."
$axis1
[1] 0 1 2 3 4
attr(,"CLASS")
[1] "ARRAY"
attr(,"VERSION")
[1] "2.3"
attr(,"TITLE")
[1] ""
attr(,"FLAVOR")
[1] "numpy"
attr(,"kind")
[1] "integer"
attr(,"name")
[1] "N."
$axis0
[1] "N"
attr(,"CLASS")
[1] "ARRAY"
attr(,"VERSION")
[1] "2.3"
attr(,"TITLE")
[1] ""
attr(,"FLAVOR")
[1] "numpy"
attr(,"kind")
[1] "string"
attr(,"name")
[1] "N."
attr(,"TITLE")
[1] ""
attr(,"CLASS")
[1] "GROUP"
attr(,"VERSION")
[1] "1.0"
attr(,"ndim")
[1] 2
attr(,"axis0_variety")
[1] "regular"
attr(,"axis1_variety")
[1] "regular"
attr(,"nblocks")
[1] 1
attr(,"block0_items_variety")
[1] "regular"
attr(,"pandas_type")
[1] "frame"
这让我想到了我的问题:理想情况下,我可以从 R 到 pandas 来回保存。我显然可以写一个从 pandas 到 R 的包装器(我认为......虽然我认为如果我使用 pandas MultiIndex可能会变得更棘手),但我认为我不能轻易地在 pandas 中使用该数据。有什么建议么?
奖励:我真正想做的是将 R 中的data.table包与 pandas 数据框一起使用(这两个包中的键控方法可疑地相似)。对此的任何帮助都非常感谢。