2

我有不同状态的客户重复,因为每个客户订阅/产品都有一行。我想new_status为客户生成一个“取消”,每个订阅状态都必须一起“取消”。

我用了:

df['duplicated'] = df.groupby('customer', as_index=False)['customer'].cumcount()

分隔索引中的每个重复项以指示重复值

Customer | Status | new_status | duplicated
 X       |canceled|            | 0
 X       |canceled|            | 1
 X       |active  |            | 2
 Y       |canceled|            | 0
 A       |canceled|            | 0
 A       |canceled|            | 1
 B       |active  |            | 0
 B       |canceled|            | 1

因此,我想使用 .apply 和/或 .loc 来生成:

Customer | Status | new_status | duplicated
 X       |canceled|            | 0
 X       |canceled|            | 1
 X       |active  |            | 2
 Y       |canceled|            | 0
 A       |canceled| canceled   | 0
 A       |canceled| canceled   | 1
 B       |active  |            | 0
 B       |canceled|            | 1
4

2 回答 2

2

Series.eq通过for比较列==并使用GroupBy.transformwith GroupBy.allfor 检查是否所有值都是True每个组的 s,然后Customer通过Series.duplicatedwith keep=Falsefor 比较返回所有欺骗。AND最后按位( )链接在一起,&并通过 设置值numpy.where

m1 = df['Status'].eq('canceled').groupby(df['Customer']).transform('all')
m2 = df['Customer'].duplicated(keep=False)

df['new_status'] = np.where(m1 & m2, 'cancelled', '')
print (df)
  Customer    Status new_status  duplicated
0        X  canceled                      0
1        X  canceled                      1
2        X    active                      2
3        Y  canceled                      0
4        A  canceled  cancelled           0
5        A  canceled  cancelled           1
6        B    active                      0
7        B  canceled                      1
于 2019-03-19T11:04:02.573 回答
1

据我了解,您可以尝试这样做:

df['new_status']=(df.groupby('Customer')['Status'].
  transform(lambda x: x.eq('canceled').all()).map({True:'cancelled'})).fillna(df.new_status)
print(df)

    Customer    Status new_status  duplicated
0   X         canceled             0         
1   X         canceled             1         
2   X         active               2         
3   Y         canceled  cancelled  0         
4   A         canceled  cancelled  0         
5   A         canceled  cancelled  1         
6   B         active               0         
7   B         canceled             1   

编辑,因为预期的 o/p 已更改:

df['new_status']=(df.groupby('Customer')['Status'].
             transform(lambda x: x.duplicated(keep=False)&(x.eq('canceled').all()))
                         .map({True:'cancelled',False:''}))
print(df)

  Customer    Status new_status  duplicated
0   X         canceled             0         
1   X         canceled             1         
2   X         active               2         
3   Y         canceled             0         
4   A         canceled  cancelled  0         
5   A         canceled  cancelled  1         
6   B         active               0         
7   B         canceled             1   
于 2019-03-19T10:59:32.963 回答