2

我还是 SAS 的新手,我想知道如何执行以下操作:

假设我有一个包含以下信息的数据库:

Time_during_the day    date    prices   volume_traded
930am                  sep02    42            300
10am                   sep02    41            200
..4pm                  sep02    40            200
930am                  sep03    40            500
10am                   sep03    41            100
..4pm                  sep03    40            350
.....

我想要的是取每日总交易量的平均值并将这个数字除以 50(总是)。所以说 avg.daily vol./50 = V; 我想要的是以大小 V 的每个间隔记录价格/时间/日期。现在,假设 V=500,我首先在我的数据库中记录第一个价格、时间和日期,然后记录相同的信息 500以后进行批量交易。有可能某一天的交易量是 300,其中一半将覆盖 v=500,另外 150 将用于填补下一个区间。

如何在一个数据库中获取这些信息?谢谢!

4

1 回答 1

3

假设您的输入数据集称为 tick_data,并且它同时按date和排序time_during_the_day。然后这就是我得到的:

%LET n = 50;

/* Calculate V - the breakpoint size */
PROC SUMMARY DATA=tick_data;
    BY date;

    OUTPUT OUT = temp_1 
           SUM (volume_traded)= volume_traded_agg;
RUN;
DATA temp_2 ;
    SET temp_1;
    V = volume_traded_agg / &n;
RUN;

/* Merge it into original dataset so that it is available */
DATA temp_3;
    MERGE tick_data temp_2;
    BY date;
RUN;

/* Final walk through tick data to output at breakpoints */
DATA results 
    /* Comment out the KEEP to see what is happening under the hood */
    (KEEP=date time_during_the_day price volume_traded)
;
    SET temp_3;

    /* The IF FIRST will not work without the BY below */
    BY date;

    /* Stateful counters */
    RETAIN 
            volume_cumulative
            breakpoint_next
            breakpoint_counter
    ;

    /* Reset stateful counters at the beginning of each day */
    IF (FIRST.date) THEN DO;
            volume_cumulative   = 0;
            breakpoint_next     = V;
            breakpoint_counter  = 0;
    END;

    /* Breakpoint test */
    volume_cumulative = volume_cumulative + volume_traded;
    IF (breakpoint_counter <= &n  AND volume_cumulative >= breakpoint_next) THEN DO;
        OUTPUT;
        breakpoint_next = breakpoint_next + V;
        breakpoint_counter = breakpoint_counter + 1;
    END;
RUN;

未来要牢记的关键 SAS 语言功能是将BYFIRSTRETAIN一起使用。这使得有状态的遍历像这样的数据。有条件OUTPUT的也在这里。

请注意,无论何时使用BY <var>,数据集都必须按包含<var>. 在tick_data所有中间临时表的情况下,它是。

附加:备选方案 V

为了使 V 等于(平均每日总交易量/n),请将上面的匹配代码块替换为以下代码块:

. . . . . .
/* Calculate V - the breakpoint size */
PROC SUMMARY DATA=tick_data;
    BY date;

    OUTPUT OUT = temp_1 
           SUM (volume_traded)= volume_traded_agg;
RUN;
PROC SUMMARY DATA = temp_1
    OUTPUT OUT = temp_1a
           MEAN (volume_traded_agg) =;
RUN;
DATA temp_2 ;
    SET temp_1a;
    V = volume_traded_agg / &n;
RUN;

/* Merge it into original dataset so that it is available */
DATA temp_3 . . . . . .
 . . . . . . 

基本上你只需插入一秒钟PROC SUMMARY来取总和的平均值。请注意如何没有BY声明,因为我们是在整个集合上平均,而不是按任何分组或桶。还要注意.MEAN (...) =后面没有名字=。这将使输出变量与输入变量具有相同的名称。

于 2012-06-13T18:18:28.223 回答