4

我正在研究电信流失问题,这是我的数据集。

http://www.sgi.com/tech/mlc/db/churn.data

名称 - http://www.sgi.com/tech/mlc/db/churn.names

我是生存分析的新手。鉴于训练数据,我的想法是建立一个生存模型来估计生存时间,并根据独立因素预测测试数据的流失/非流失。谁能帮我提供代码或指针关于如何解决这个问题。

准确地说,假设我的火车数据有

客户电话使用详情、计划详情、他的帐户使用期限等以及他是否流失。

使用通用分类模型,我可以预测测试数据的流失率。现在使用生存分析,我想预测测试数据的生存期。

谢谢,麦迪

4

2 回答 2

12

If you're still interested (or for the benefit of those coming later), I've written a few guides specifically for conducting survival analysis on customer churn data using R. They cover a bunch of different analytical techniques, all with sample data and R code.

Basic survival analysis: http://daynebatten.com/2015/02/customer-churn-survival-analysis/

Basic cox regression: http://daynebatten.com/2015/02/customer-churn-cox-regression/

Time-dependent covariates in cox regression: http://daynebatten.com/2015/12/survival-analysis-customer-churn-time-varying-covariates/

Time-dependent coefficients in cox regression: http://daynebatten.com/2016/01/customer-churn-time-dependent-coefficients/

Restricted mean survival time (quantify the impact of churn in dollar terms): http://daynebatten.com/2015/03/customer-churn-restricted-mean-survival-time/

Pseudo-observations (quantify dollar gain/loss associated with the churn effects of variables): http://daynebatten.com/2015/03/customer-churn-pseudo-observations/

Please forgive the goofy images.

于 2015-03-18T18:24:55.943 回答
4

以下是一些帮助您入门的代码:

一、读取数据

nm <- read.csv("http://www.sgi.com/tech/mlc/db/churn.names", 
               skip=4, colClasses=c("character", "NULL"), header=FALSE, sep=":")[[1]]
dat <- read.csv("http://www.sgi.com/tech/mlc/db/churn.data", header=FALSE, col.names=c(nm, "Churn"))

用于Surv()设置生存对象进行建模

library(survival)

s <- with(dat, Surv(account.length, as.numeric(Churn)))

拟合 cox 比例风险模型并绘制结果

model <- coxph(s ~ total.day.charge + number.customer.service.calls, data=dat[, -4])
summary(model)
plot(survfit(model))

在此处输入图像描述

添加层:

model <- coxph(s ~ total.day.charge + strata(number.customer.service.calls <= 3), data=dat[, -4])
summary(model)
plot(survfit(model), col=c("blue", "red"))

在此处输入图像描述

于 2014-11-23T11:25:23.500 回答