2

我正在尝试在分组数据集上应用函数。为此,我有这个 Pandas 数据框:

test_df = pd.DataFrame({
        'A':list('aabdee'),
        'AA':['2020-03-22', '2020-03-22', '2020-03-29', '2020-03-22','2020-03-22', '2020-03-29'],
         'B':[1,0.5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,1,7,1,1],
         'E':[5,3,6,9,2,4]
})

我想将 Zscore 应用于每一列(按变量 A 和 AA 分组)。所以我做了:

numeric_columns = test_df.select_dtypes(np.number)
test_df.groupby(['A', 'AA'])[numeric_columns.columns].apply(stats.zscore)

但后来我有很多错误,像这样:

Series.name must be a hashable type

和这个:

RuntimeWarning: invalid value encountered in true_divide
  return (a - mns) / sstd
4

1 回答 1

1

对我来说工作GroupBy.transform

numeric_columns = test_df.select_dtypes(np.number)

c = numeric_columns.columns
test_df[c] = test_df.groupby(['A', 'AA'])[c].transform(stats.zscore)

print (test_df)
   A          AA    B    C    D    E
0  a  2020-03-22  1.0 -1.0 -1.0  1.0
1  a  2020-03-22 -1.0  1.0  1.0 -1.0
2  b  2020-03-29  NaN  NaN  NaN  NaN
3  d  2020-03-22  NaN  NaN  NaN  NaN
4  e  2020-03-22  NaN  NaN  NaN  NaN
5  e  2020-03-29  NaN  NaN  NaN  NaN

编辑:

c = numeric_columns.columns
for g, df in  test_df.groupby(['A', 'AA']):
    print (df)

   A          AA    B  C  D  E
0  a  2020-03-22  1.0  7  1  5
1  a  2020-03-22  0.5  8  3  3
   A          AA    B  C  D  E
2  b  2020-03-29  4.0  9  1  6
   A          AA    B  C  D  E
3  d  2020-03-22  5.0  4  7  9
   A          AA    B  C  D  E
4  e  2020-03-22  5.0  2  1  2
   A          AA    B  C  D  E
5  e  2020-03-29  4.0  3  1  4
于 2020-04-17T11:58:05.187 回答