2

当每个客户有多个计划时,他们就会被复制。我想为客户设置状态:

如果他们填充了“cancelled_at”的所有产品,则客户状态被取消,但是当不是每个产品都填充了“cancelled_at”时,状态为“降级”,因为他丢失了一个产品。

customer|canceled_at|status
x       |3/27/2018  |
x       |           |
y       |2/2/2018   |
y       |2/2/2018   |
z       |1/1/2018   |
a       |           |      

我已经有取消状态,现在我只需要降级

df['status']=(df.groupby('customer')['canceled_at'].
  transform(lambda x: x.notna().all()).map({True:'canceled'})).fillna(df.status)
customer|canceled_at|status
x       |3/27/2018  |downgrade
x       |           |downgrade
y       |2/2/2018   |canceled
y       |2/2/2018   |canceled
z       |1/1/2018   |canceled
a       |           |      
4

2 回答 2

1

这是可能的比较列是否没有缺失值并按 和 分组Series customerGroupBy.transform用于GroupBy.all测试 GroupBy.any所有值Trues(所有非缺失)或至少一个未缺失值(任何非缺失)并将其传递给numpy.select

g = df['canceled_at'].notna().groupby(df['customer'])
m1 = g.transform('all')
m2 = g.transform('any')

df['status'] = np.select([m1, m2],['canceled','downgrade'], np.nan)
print (df)
  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x         NaN  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a         NaN        nan

或者:

df['status'] = np.select([m1, m2],['canceled','downgrade'], '')
print (df)
  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x         NaN  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a         NaN         

如果只有NaNs 组需要转换为downgrade

mask = df['canceled_at'].notna().groupby(df['customer']).transform('all')
df['status'] = np.where(mask,'canceled','downgrade')
print (df)
  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x         NaN  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a         NaN  downgrade  
于 2019-03-27T11:10:53.313 回答
1

这是一种方法:

import pandas as pd

def select_status(canceled):
    c = canceled.count()
    if c == 0:
        status = ''
    elif c == len(canceled):
        status = 'canceled'
    else:
        status = 'downgrade'
    return pd.Series(status, index=canceled.index)

df = pd.DataFrame({'customer': ['x', 'x', 'y', 'y', 'z', 'a'],
                   'canceled_at': ['3/27/2018', None, '2/2/2018', '2/2/2018', '1/1/2018', None]})
df['status'] = df.groupby('customer')['canceled_at'].apply(select_status)
print(df)

输出:

  customer canceled_at     status
0        x   3/27/2018  downgrade
1        x        None  downgrade
2        y    2/2/2018   canceled
3        y    2/2/2018   canceled
4        z    1/1/2018   canceled
5        a        None
于 2019-03-27T11:26:44.913 回答