我有一个相当大的数据集 (2678271, 52) 和一个消耗机器内存 6.5% 的 5 维索引。当我打电话
df.sortlevel(k)
我收到以下错误:
MemoryError Traceback (most recent call last)
in ()
----> 1 df = df.sortlevel(4)
/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in sortlevel(self, level, axis, ascending)
2978 raise Exception('can only sort by level with a hierarchical index')
2979
-> 2980 new_axis, indexer = the_axis.sortlevel(level, ascending=ascending)
2981
2982 if self._data.is_mixed_dtype():
/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/index.pyc in sortlevel(self, level, ascending)
1856 indexer = _indexer_from_factorized((primary,) + tuple(labels),
1857 (primshp,) + tuple(shape),
-> 1858 compress=False)
1859 if not ascending:
1860 indexer = indexer[::-1]
/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _indexer_from_factorized(labels, shape, compress)
2124 max_group = np.prod(shape)
2125
-> 2126 indexer, _ = lib.groupsort_indexer(comp_ids.astype(np.int64), max_group)
2127
2128 return indexer
/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/lib.so in pandas.lib.groupsort_indexer (pandas/src/tseries.c:55052)()
MemoryError:
是否存在引发此错误的硬编码条件?或者是否有可能即使数据只使用了 6.5% 的内存(根据 htop),操作也会吃掉剩余的内存?