python - 使用熊猫附加时出现 hdfstore 错误

Question

我收到以下错误：

    exportStore.append(key, hdfStoreLocal, index = False, data_columns = True)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 911, in append
    **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 1270, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 3605, in write
    **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 3293, in create_axes
    raise e
ValueError: invalid itemsize in generic type tuple

关于为什么会发生这种情况的任何想法？这是一个相当大的项目，所以我不确定我可以提供什么代码，但这发生在第一次追加时。任何帮助将不胜感激。

编辑：：：：：：

显示版本结果：

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: None
Cython: 0.20.2
numpy: 1.8.1
scipy: 0.13.3
statsmodels: None
IPython: 1.2.1
sphinx: 1.2.2
patsy: None
scikits.timeseries: None
dateutil: 1.5
pytz: 2012c
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.8
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

信息结果：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 61500 entries, 0 to 61499
Data columns (total 48 columns):
Sequential_Code_1        61500 non-null float64
Age_1                    61500 non-null float64
Sex_1                    61500 non-null object
Race_1                   61500 non-null object
Ethnicity_1              61500 non-null object
Principal_Code_1         61500 non-null object
Admitting_Code_1         61500 non-null object
Principal_Code_2         61500 non-null object
Other_Codes_1            61500 non-null object
Other_Codes_2            61500 non-null object
Other_Codes_3            61500 non-null object
Other_Codes_4            61500 non-null object
Other_Codes_5            61500 non-null object
Other_Codes_6            61500 non-null object
Other_Codes_7            61500 non-null object
Other_Codes_8            61500 non-null object
Other_Codes_9            61500 non-null object
Other_Codes_10           61500 non-null object
Other_Codes_11           61500 non-null object
Other_Codes_12           61500 non-null object
Other_Codes_13           61500 non-null object
Other_Codes_14           61500 non-null object
Other_Codes_15           61500 non-null object
Other_Codes_16           61500 non-null object
Other_Codes_17           61500 non-null object
Other_Codes_18           61500 non-null object
Other_Codes_19           61500 non-null object
Other_Codes_20           61500 non-null object
Other_Codes_21           61500 non-null object
Other_Codes_22           61500 non-null object
Other_Codes_23           61500 non-null object
Other_Codes_24           61500 non-null object
External_Code_1          61500 non-null object
Place_Code_1             61500 non-null object

头：

head       Sequential_Number_1  Age_1 Sex_1 Race_1  \
1128                   2.000000e+13     73             F             01   
2185                   2.000000e+13     52             M             01   
2202                   2.000000e+13     64             M             01   
2283                   2.000000e+13     72             F             01   
4471                   2.000000e+13     62             F             01

score 1 · Accepted Answer

问题是您需要指定一个，请参阅此处min_itemsize的文档。

这控制了列对于类似字符串的列的大小。如果您对 ANY 值没有任何长度，则它会失败（可能是更好的错误消息）。它需要最大长度的传递值来确定它需要的大小。

指定它的原因是说您要附加多个块。您可以在块 2 中有一个更长的字符串，这意味着该列应该至少是那个大小，但只查看块 1 并不会告诉您这一点。

进一步将预处理此数据以不具有 0-len 字符串，而是np.nan用作正确处理的缺失值（HDFstore / pandas）。

python - 使用熊猫附加时出现 hdfstore 错误

1 回答 1

Related

Reference