python - 使用 hdf5 将数据从 R 传输到 pandas

Question

我在 R 中有一个数据集，我想在 Python 中使用它。为了兼容性，最好将数据以 hdf5 格式保存在 R 中，然后在 python 中使用pandas.io.pytables.HDFStore. 下面是从 R 生成 hdf5 对象的代码。

library(rhdf5)
mat = matrix(data = rexp(200, rate = 10), nrow = 10, ncol = 10)
colnames(mat) = c(1:10)
rownames(mat)= c('A','B','C','D','E','F','G','H','I','K')
vec = c(1:10)
xx = list(mat=mat, vec=vec)
h5save(xx, file='xx.h5')

但是，当我尝试在 pandas 中加载它时，store.keys() 为空，直接访问包含的对象会引发错误：

from pandas.io.pytables import HDFStore
store = HDFStore('xx.h5')
store.keys() # returns []

当我尝试 .get('xx') 时，错误与我提供无效对象名称时不同：

store.get('xx')
# TypeError: cannot create a storer if the object is not existing nor a value are passed
store.get('invalid')
# KeyError: 'No object named invalid in the file'

在 R 中加载 hdf5 文件效果很好。

有没有（简单的）方法来修复熊猫中的文件加载？或者，我也可以使用任何允许我将 R 对象转移到 pandas 的解决方案。

编辑：下面的文件详细信息

$ ptdump-2.7 -av xx.h5 
/ (RootGroup) ''
  /._v_attrs (AttributeSet), 0 attributes
/xx (Group) ''
  /xx._v_attrs (AttributeSet), 0 attributes
/xx/mat (CArray(10, 10), zlib(7)) ''
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (10, 10)
  /xx/mat._v_attrs (AttributeSet), 0 attributes
/xx/vec (CArray(10,), zlib(7)) ''
  atom := Int32Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (10,)
  /xx/vec._v_attrs (AttributeSet), 0 attributes

python - 使用 hdf5 将数据从 R 传输到 pandas

0 回答 0

Related

Reference