python - 如何按月找到流失客户？蟒蛇熊猫

Question

我有一个大型客户数据集，其中包含客户 ID、服务 ID、产品等。因此，我们可以衡量流失的两种方法是在客户 ID 级别，如果整个客户离开，在服务 ID 级别，其中也许他们取消了 5 项服务中的 2 项。

数据看起来像这样，我们可以看到

Alligators 在 1 月底停止成为客户，因为他们在 2 月没有任何行（CustomerChurn）
阿姨在 1 月底不再是客户，因为他们在 2 月没有任何行（CustomerChurn）
1 月和 2 月，砖块继续供应苹果和橙子 (ServiceContinue)
Bricks 继续成为客户，但在 1 月底取消了两项服务 (ServiceChurn)

我正在尝试编写一些创建“流失”列的代码。我尝试过

从 2019 年 10 月开始使用 Set 手动获取 CustomerID 和 ServiceID 列表，然后将其与 2019 年 11 月进行比较，以找到流失的列表。这不是太慢，但似乎不是很 Pythonic。

谢谢！

data = {'CustomerName': ['Alligators','Aunties', 'Bricks', 'Bricks','Bricks', 'Bricks', 'Bricks', 'Bricks', 'Bricks', 'Bricks'], 
        'ServiceID': [1009, 1008, 1001, 1002, 1003, 1004, 1001, 1002, 1001, 1002], 
        'Product': ['Apples', 'Apples', 'Apples', 'Bananas', 'Oranges', 'Watermelon', 'Apples', 'Bananas', 'Apples', 'Bananas'], 
        'Month': ['Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'], 
        'Year': [2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021],
        'Churn': ['CustomerChurn', 'CustomerChurn', 'ServiceContinue', 'ServiceContinue', 'ServiceChurn', 'ServiceChurn','ServiceContinue', 'ServiceContinue', 'NA', 'NA']}
df = pd.DataFrame(data)
df

score 1 · Accepted Answer

我认为这接近你想要的，除了最后两行中的 NA，但如果你真的需要那些 NA，那么你可以按日期过滤并更改值。

因为您实际上是在测试两个不同的分组，所以我通过一个函数发送第一个 customername 分组，根据我所看到的，我通过第二个函数发送一个更精细的分组。对于这个数据集，它似乎工作。

我创建了一个实际的日期列，并确保在分组之前对所有内容进行排序。函数内部的逻辑是测试组的最大日期以查看它是否小于某个日期。看起来您正在将 3 月作为当前月份进行测试

您应该能够根据自己的需要对其进行调整

df['testdate'] = df.apply(lambda x: datetime.datetime.strptime('-'.join((x['Month'], str(x['Year']))),'%b-%Y'), axis=1)
df = df.sort_values('testdate')
df1 = df.drop('Churn',axis=1)

def get_customerchurn(x, tdate):
    # print(x)
    # print(tdate)
    if x.testdate.max() < tdate:
        x.loc[:, 'Churn'] = 'CustomerChurn'
        return x
    else:
        x = x.groupby(['CustomerName', 'Product']).apply(lambda x: get_servicechurn(x, datetime.datetime(2021,3,1)))
        return x

def get_servicechurn(x, tdate):
    print(x)
    # print(tdate)
    if x.testdate.max() < tdate:
        x.loc[:, 'Churn'] = 'ServiceChurn'
        return x
    else:
        x.loc[:, 'Churn'] = 'ServiceContinue'
        return x

df2 = df1.groupby(['CustomerName']).apply(lambda x: get_customerchurn(x, datetime.datetime(2021,3,1)))
df2

输出：

  CustomerName  ServiceID     Product Month  Year   testdate            Churn
0   Alligators       1009      Apples   Jan  2021 2021-01-01    CustomerChurn
1      Aunties       1008      Apples   Jan  2021 2021-01-01    CustomerChurn
2       Bricks       1001      Apples   Jan  2021 2021-01-01  ServiceContinue
3       Bricks       1002     Bananas   Jan  2021 2021-01-01  ServiceContinue
4       Bricks       1003     Oranges   Jan  2021 2021-01-01     ServiceChurn
5       Bricks       1004  Watermelon   Jan  2021 2021-01-01     ServiceChurn
6       Bricks       1001      Apples   Feb  2021 2021-02-01  ServiceContinue
7       Bricks       1002     Bananas   Feb  2021 2021-02-01  ServiceContinue
8       Bricks       1001      Apples   Mar  2021 2021-03-01  ServiceContinue
9       Bricks       1002     Bananas   Mar  2021 2021-03-01  ServiceContinue

python - 如何按月找到流失客户？蟒蛇熊猫

1 回答 1

Related

Reference