0

我们想分析新获得的客户在 R 中仍然是客户的时间。数据集在 730 天进行了右删失,我们有 10 个自变量。

模型如下所示:ln(Duration)=X'B+S*e,其中 X 是 10 个自变量的矩阵,B 是系数向量,S 是尺度参数,e 是误差项

我们使用的数据集如下: http ://www.drvkumar.com/books/25/Statistical-Methods-in-Customer-Relationship-Management

我们使用survival包及其survreg函数,输入如下代码:

Dur <- survreg(Surv(Duration, Censor) ~ Acq_Expense + Acq_Expense_SQ + Ret_Expense + Ret_Expense_SQ + Crossbuy + Frequency + Frequency_SQ + Industry + Revenue + Employees, dist='weibull', data = daten [daten$Acquisition==1, ])
summary(Dur)

但结果不正确,因为使用 SAS 代码会生成另一个输出(已确认是正确的)。

我们尝试生成一个 Duration 的日志变量,并在前面描述的模型中实现了新的变量 logDur:

> logDur <- log(daten$Duration)
> Dur <- survreg(Surv(logDur, Censor) ~ Acq_Expense + Acq_Expense_SQ + Ret_Expense + Ret_Expense_SQ + Crossbuy + Frequency + Frequency_SQ + Industry + Revenue + Employees, dist='weibull', data = daten [daten$Acquisition==1, ])
> summary(Dur)

但是弹出以下错误信息:Fehler in Surv(logDur, Censor) : Time and status are different lengths

如果有帮助,这里是 SAS 代码:

proc lifereg data = statcrm.customer_acquisition;
model duration*censor(1) =  acq_expense acq_expense_sq ret_expense ret_expense_sq crossbuy frequency frequency_sq industry revenue employees;
where acquisition = 1; 
output out = statcrm.duration xbeta = xb p = pred sres = resid;
run; quit;

data statcrm.duration1;
set statcrm.duration;
pred_duration = exp(xb+0.138*(log(-log(1-0.5))));
ad = abs(duration - pred_duration); 
ad1 = abs(duration - 333.3165);
run; quit;

proc sql; select mean(duration) from statcrm.duration1 where acquisition = 1 and censor = 0; quit;

proc sql; select mean(ad) as mad, (mean(ad/duration)) as mape, 
mean(ad1) as random_mad, (mean(ad1/duration)) as mape1 
from statcrm.duration1 where acquisition = 1 and censor = 0; quit;
4

0 回答 0