1

pandas.DataFrame()在文件大小大于大约 47 GiB 后,将字符串值(数值可以)附加到 HDF5 存储时会发生异常。字符串的最小大小、记录数、列数都不重要。文件大小很重要。

异常跟踪的底部:

  File "..\..\hdf5-1.8.14\src\H5FDsec2.c", line 822, in H5FD_sec2_write
file write failed: time = Tue Aug 18 18:26:17 2015
, filename = 'large_file.h5', file descriptor = 4, errno = 22, error message = 'Invalid argument', buf = 0000000066A40018, total write size = 262095, bytes this sub-write = 262095, bytes actually written = 18446744073709551615, offset = 47615949533

重现的代码:

import numpy as np
import pandas as pd

for i in range(200):
    df = pd.DataFrame(np.char.mod('random string object (%f)', np.random.rand(5000000,3)), columns=('A','B','C'))
    print('writing chunk №', i, '...', end='', flush=True)
    with pd.HDFStore('large_file.h5') as hdf:
        # Construct unique index
        try:
            nrows = hdf.get_storer('df').nrows
        except:
            nrows = 0
        df.index = pd.Series(df.index) + nrows    

        # Append the dataframe to the storage. Exception hppens here
        hdf.append('df', df, format='table')
    print('done')

环境:Windows7 x64机器,python 3.4.3,pandas 0.16.2,pytables 3.2.0,HDF5 1.8.14。

问题是如果它位于上面的 python 代码中,如何解决问题,或者如果与 HDF5 相关,如何避免它。谢谢。

4

0 回答 0