1

I'm trying to use python's pandas groupby, apply, where and quantile to replace values that fall below a 50% quantile with NaN by 'date' group however it seems to be returning lists in the cells. How can I get these results in a new column after the column 'value'.

This is the code I have (any other approaches are welcome). It returns lists in cells:

In[0]: df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))  
Out[0]:                            
date                            value     
2019-12-23  [nan, nan, 3.0, 4.0, 5.0]
2014-08-13  [nan, nan, 3.0, 4.0, 5.0]

If I create a new column it returns NaN in new column:

In[1]: df['new_value']= df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))
Out[1]: 
        date  value    new_value
0 2019-12-23      1.0       NaN
1 2019-12-23      2.0       NaN
2 2019-12-23      3.0       NaN
3 2019-12-23      4.0       NaN
4 2019-12-23      5.0       NaN
5 2014-08-13      1.0       NaN
6 2014-08-13      2.0       NaN
7 2014-08-13      3.0       NaN
8 2014-08-13      4.0       NaN
9 2014-08-13      5.0       NaN

I would like to get to this:

        date     value    new_value
0 2019-12-23      1.0        NaN
1 2019-12-23      2.0        NaN
2 2019-12-23      3.0        3.0
3 2019-12-23      4.0        4.0
4 2019-12-23      5.0        5.0
5 2014-08-13      1.0        NaN
6 2014-08-13      2.0        NaN
7 2014-08-13      3.0        3.0
8 2014-08-13      4.0        4.0
9 2014-08-13      5.0        5.0
4

1 回答 1

1

而不是apply你可以使用transform

df["new_value"] = df.groupby("date")["value"].transform(
    lambda x: np.where(x < x.quantile(0.5), np.nan, x)
)


    date    value   new_value
0   2019-12-23  1.0     NaN
1   2019-12-23  2.0     NaN
2   2019-12-23  3.0     3.0
3   2019-12-23  4.0     4.0
4   2019-12-23  5.0     5.0
5   2014-08-13  1.0     NaN
6   2014-08-13  2.0     NaN
7   2014-08-13  3.0     3.0
8   2014-08-13  4.0     4.0
9   2014-08-13  5.0     5.0
于 2020-05-22T18:05:43.923 回答