假设您的输入数据集称为 tick_data,并且它同时按date
和排序time_during_the_day
。然后这就是我得到的:
%LET n = 50;
/* Calculate V - the breakpoint size */
PROC SUMMARY DATA=tick_data;
BY date;
OUTPUT OUT = temp_1
SUM (volume_traded)= volume_traded_agg;
RUN;
DATA temp_2 ;
SET temp_1;
V = volume_traded_agg / &n;
RUN;
/* Merge it into original dataset so that it is available */
DATA temp_3;
MERGE tick_data temp_2;
BY date;
RUN;
/* Final walk through tick data to output at breakpoints */
DATA results
/* Comment out the KEEP to see what is happening under the hood */
(KEEP=date time_during_the_day price volume_traded)
;
SET temp_3;
/* The IF FIRST will not work without the BY below */
BY date;
/* Stateful counters */
RETAIN
volume_cumulative
breakpoint_next
breakpoint_counter
;
/* Reset stateful counters at the beginning of each day */
IF (FIRST.date) THEN DO;
volume_cumulative = 0;
breakpoint_next = V;
breakpoint_counter = 0;
END;
/* Breakpoint test */
volume_cumulative = volume_cumulative + volume_traded;
IF (breakpoint_counter <= &n AND volume_cumulative >= breakpoint_next) THEN DO;
OUTPUT;
breakpoint_next = breakpoint_next + V;
breakpoint_counter = breakpoint_counter + 1;
END;
RUN;
未来要牢记的关键 SAS 语言功能是将BY
、FIRST
和RETAIN
一起使用。这使得有状态的遍历像这样的数据。有条件OUTPUT
的也在这里。
请注意,无论何时使用BY <var>
,数据集都必须按包含<var>
. 在tick_data
所有中间临时表的情况下,它是。
附加:备选方案 V
为了使 V 等于(平均每日总交易量/n),请将上面的匹配代码块替换为以下代码块:
. . . . . .
/* Calculate V - the breakpoint size */
PROC SUMMARY DATA=tick_data;
BY date;
OUTPUT OUT = temp_1
SUM (volume_traded)= volume_traded_agg;
RUN;
PROC SUMMARY DATA = temp_1
OUTPUT OUT = temp_1a
MEAN (volume_traded_agg) =;
RUN;
DATA temp_2 ;
SET temp_1a;
V = volume_traded_agg / &n;
RUN;
/* Merge it into original dataset so that it is available */
DATA temp_3 . . . . . .
. . . . . .
基本上你只需插入一秒钟PROC SUMMARY
来取总和的平均值。请注意如何没有BY
声明,因为我们是在整个集合上平均,而不是按任何分组或桶。还要注意.MEAN (...) =
后面没有名字=
。这将使输出变量与输入变量具有相同的名称。