predictions
我有由三列组成的panada 数据框。我使用三个创建了这个数据框memmap array
。
predictions = pd.dataframe{'cell': list_1, 'tree': list_2, 'predict': list_3, 'label': list_4}
现在我想在这个数据框的两列上进行分组,在第三列上取平均值,如下所示:
df = predictions.groupby(['tree', 'cell'])['list3'].mean()
但它给了我一个错误,说 memmap 数组是不可散列的!它不能执行groupby
。我真的需要这样做,groupby
否则我必须做两个for
循环,这需要永远,因为我的字典有1,000,000
行。我想知道有人知道解决方案吗?谢谢
Edited
cell
和tree
columns 是来自 的项目列表memmap array
。predict
并且label
只是普通列表。项目列表memmap array
如下所示:单元格
[memmap([415], dtype=int32),
memmap([143], dtype=int32),
memmap([96], dtype=int32),
memmap([432], dtype=int32),
memmap([104], dtype=int32),
memmap([76], dtype=int32),
memmap([312], dtype=int32),
memmap([143], dtype=int32),
memmap([312], dtype=int32),
memmap([64], dtype=int32),
memmap([296], dtype=int32)]
预测数据框如下所示:
cell label predict tree
0 [415] 0 1 [19]
1 [143] 1 1 [22]
2 [96] 0 1 [19]
3 [432] 1 1 [12]
4 [104] 0 1 [21]
5 [76] 0 1 [19]
6 [312] 1 1 [22]
7 [143] 1 1 [22]
8 [312] 1 1 [22]
9 [64] 0 1 [18]
10 [296] 1 1 [22]
我收到以下错误:
predictions_target = predictions.groupby(['tree', 'cell']) ['predict'].mean()
File "/usr/venv/local/lib/python2.7/site-packages/pandas /core/groupby.py", line 1015, in mean
return self._python_agg_general(f)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 826, in _python_agg_general
return self._python_apply_general(f)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 698, in _python_apply_general
self.axis)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1577, in apply
splitter = self._get_splitter(data, axis=axis)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1563, in _get_splitter
comp_ids, _, ngroups = self.group_info
File "pandas/src/properties.pyx", line 34, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:44222)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1670, in group_info
comp_ids, obs_group_ids = self._get_compressed_labels()
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1677, in _get_compressed_labels
all_labels = [ping.labels for ping in self.groupings]
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 2308, in labels
self._musr/venv/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 2319, in _make_labels
labels, uniques = algos.factorize(self.grouper, sort=self.sort)
File "/usr/venv/local/lib/python2.7/site-packages/pandas/core/algorithms.py", line 313, in factorize
labels = table.get_labels(vals, uniques, 0, na_sentinel, True)
File "pandas/src/hashtable_class_helper.pxi", line 843, in pandas.hashtable.PyObjectHashTable.get_labels (pandas/hashtable.c:14831)
TypeError: unhashable type: 'memmap'