python - 如果您只有一列，为什么 Pandas 转换会失败

Question

看了这个问题后，我做了一些乱七八糟的事情，发现了这个：

import pandas as pd

df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
df['num_totals'] = df.groupby('a').transform('count')

gives ValueError:

ValueError                                Traceback (most recent call last)
<ipython-input-38-157c6339ad93> in <module>()
      3 #df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4], 'b':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
      4 df = pd.DataFrame({'a':[1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
----> 5 df['num_totals'] = df.groupby('a').transform('count')
      6 
      7 #df['num_totals']=df.groupby('a')[['a']].transform('count')

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.pyc in __setitem__(self, key, value)
   2117         else:
   2118             # set column
-> 2119             self._set_item(key, value)
   2120 
   2121     def _setitem_slice(self, key, value):

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value)
   2164         """
   2165         value = self._sanitize_column(key, value)
-> 2166         NDFrame._set_item(self, key, value)
   2167 
   2168     def insert(self, loc, column, value, allow_duplicates=False):

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\generic.pyc in _set_item(self, key, value)
    677 
    678     def _set_item(self, key, value):
--> 679         self._data.set(key, value)
    680         self._clear_item_cache()
    681 

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in set(self, item, value)
   1779         except KeyError:
   1780             # insert at end
-> 1781             self.insert(len(self.items), item, value)
   1782 
   1783         self._known_consolidated = False

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in insert(self, loc, item, value, allow_duplicates)
   1793 
   1794             # new block
-> 1795             self._add_new_block(item, value, loc=loc)
   1796 
   1797         except:

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in _add_new_block(self, item, value, loc)
   1909             loc = self.items.get_loc(item)
   1910         new_block = make_block(value, self.items[loc:loc + 1].copy(),
-> 1911                                self.items, fastpath=True)
   1912         self.blocks.append(new_block)
   1913 

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in make_block(values, items, ref_items, klass, fastpath, placement)
    964             klass = ObjectBlock
    965 
--> 966     return klass(values, items, ref_items, ndim=values.ndim, fastpath=fastpath, placement=placement)
    967 
    968 # TODO: flexible with index=None and/or items=None

C:\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\internals.pyc in __init__(self, values, items, ref_items, ndim, fastpath, placement)
     42         if len(items) != len(values):
     43             raise ValueError('Wrong number of items passed %d, indices imply %d'
---> 44                              % (len(items), len(values)))
     45 
     46         self.set_ref_locs(placement)

ValueError: Wrong number of items passed 1, indices imply 0

但是，如果我有 2 列，那么它可以正常工作：

df = pd.DataFrame({'a':1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4],'b':1,1,1,1,2,2,3,3,3,4,4,4,4,4,4,4]})
df['num_totals'] = df.groupby('a').transform('count')
df



Out[40]:
    a  b  num_totals
0   1  1           4
1   1  1           4
2   1  1           4
3   1  1           4
4   2  2           2
5   2  2           2
6   3  3           3
7   3  3           3
8   3  3           3
9   4  4           7
10  4  4           7
11  4  4           7
12  4  4           7
13  4  4           7
14  4  4           7
15  4  4           7

或者如果我使用单列 df 执行此操作：

df['num_totals']=df.groupby('a')[['a']].transform('count')

有一个类似的SO 帖子，但我不清楚为什么系列应该失败并且数据框应该在上面的示例中工作，以及为什么有 2 个或更多列会工作。

我正在使用 Python 2.7 64 位和 Pandas 0.12

score 8 · Accepted Answer

DF 中的单列

如上所述，这将返回与原始大小相同的系列

In [32]: df.groupby('a')['a'].transform('count')
Out[32]: 
0     4
1     4
2     4
3     4
4     2
5     2
6     3
7     3
8     3
9     7
10    7
11    7
12    7
13    7
14    7
15    7
Name: a, dtype: int64

但是，这是返回一个空帧

In [33]: df.groupby('a').transform('count')
Out[33]: 
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

您不能将一个空框架作为列分配给另一个框架，因为这本质上是一个模棱两可的分配（尽管您可以提出它应该“工作”的情况）

起始 DF 中的两列

两列情况返回单列DataFrame

In [42]: df2.groupby('a').transform('count')
Out[42]: 
    b
0   4
1   4
2   4
3   4
4   2
5   2
6   3
7   3
8   3
9   7
10  7
11  7
12  7
13  7
14  7
15  7

In [43]: type(df2.groupby('a').transform('count'))
Out[43]: pandas.core.frame.DataFrame

Or a series

In [45]: df2.groupby('a')['a'].transform('count')
Out[45]: 
0     4
1     4
2     4
3     4
4     2
5     2
6     3
7     3
8     3
9     7
10    7
11    7
12    7
13    7
14    7
15    7
Name: a, dtype: int64

In [46]: type(df.groupby('a')['a'].transform('count'))
Out[46]: pandas.core.series.Series

这“有效”是因为 pandas 确实允许分配单个列框架来工作，因为它将采用基础系列。

所以 pandas 实际上是想提供帮助。也就是说，我发现这是一个不清楚的错误消息，用于尝试分配一个空框架。

python - 如果您只有一列，为什么 Pandas 转换会失败

1 回答 1

DF 中的单列

起始 DF 中的两列

Related

Reference