pandas.DataFrame()
在文件大小大于大约 47 GiB 后,将字符串值(数值可以)附加到 HDF5 存储时会发生异常。字符串的最小大小、记录数、列数都不重要。文件大小很重要。
异常跟踪的底部:
File "..\..\hdf5-1.8.14\src\H5FDsec2.c", line 822, in H5FD_sec2_write
file write failed: time = Tue Aug 18 18:26:17 2015
, filename = 'large_file.h5', file descriptor = 4, errno = 22, error message = 'Invalid argument', buf = 0000000066A40018, total write size = 262095, bytes this sub-write = 262095, bytes actually written = 18446744073709551615, offset = 47615949533
重现的代码:
import numpy as np
import pandas as pd
for i in range(200):
df = pd.DataFrame(np.char.mod('random string object (%f)', np.random.rand(5000000,3)), columns=('A','B','C'))
print('writing chunk №', i, '...', end='', flush=True)
with pd.HDFStore('large_file.h5') as hdf:
# Construct unique index
try:
nrows = hdf.get_storer('df').nrows
except:
nrows = 0
df.index = pd.Series(df.index) + nrows
# Append the dataframe to the storage. Exception hppens here
hdf.append('df', df, format='table')
print('done')
环境:Windows7 x64机器,python 3.4.3,pandas 0.16.2,pytables 3.2.0,HDF5 1.8.14。
问题是如果它位于上面的 python 代码中,如何解决问题,或者如果与 HDF5 相关,如何避免它。谢谢。