我有以下示例数据:
data have;
input username $ amount betdate : datetime.;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 90 12NOV2008:12:04:01
player1 -100 04NOV2008:09:03:44
player2 120 07NOV2008:14:03:33
player1 -50 05NOV2008:09:00:00
player1 -30 05NOV2008:09:05:00
player1 20 05NOV2008:09:00:05
player2 10 09NOV2008:10:05:10
player2 -35 15NOV2008:15:05:33
run;
PROC PRINT data=have; RUN;
proc sort data=have;
by username betdate;
run;
data want;
set have;
by username dateOnly betdate;
retain calendarTime eventTime cumulativeDailyProfit profitableFlag totalDailyProfit;
if first.username then calendarTime = 0;
if first.dateOnly then calendarTime + 1;
if first.username then eventTime = 0;
if first.betdate then eventTime + 1;
if first.username then cumulativeDailyProfit= 0;
if first.dateOnly then cumulativeDailyProfit= 0;
if first.betdate then cumulativeDailyProfit+ amount;
if first.dateOnly then totalDailyProfit = 0;
if first.betdate then totalDailyProfit + amount;
PROC PRINT data=want; RUN;
输出“cumulativeDailyProfit”中的最后一列正是我想要的:一个递增值,它添加了“amount”字段的值。但是,我不希望“totalDailyProfit”字段发生同样的情况,因为我希望它显示一天结束时的利润,即每个客户的累积每日利润的最后一个值。
例如,上面的八列理想情况下会显示以下内容:-100、-60、-60、-60、90、120、10、-35。如果与当天和该客户相关的行的此值大于 0,我将设置布尔值“profitableFlag”。
这实际上可以在数据步骤中完成吗?我希望能够运行以下查询(在 case when 子句中使用正确的标志)来获得平均值、获胜天数的平均值和失败天数的平均值。
proc sql;
select calendarTime,
mean(amount) as meanStake,
mean(case when profitableFlag = 1 then amount else . End) as meanLosingDayStake,
mean(case when profitableFlag = 1 then amount else . End) as meanWinningDayStake
from want
group by 1;
quit;