3

Hye guys,

I included a screenshot to help clarify my problem:

http://i40.tinypic.com/mcrnmv.jpg.

I'm trying to calculate some kind of moving average and moving standard deviation. The thing is I want to calculate the coefficients of variation (stdev/avg) for the actual value. Normally this is done by calculating the stdev and avg for the past 5 years. However sometimes there will be observations in my database for which I do not have the information of the past 5 years (maybe only 3, 2 etc). That's why i want a code that will calculate the avg and stdev even if there is no information for the whole 5 years.

Also, as you see in the observations, sometimes I have information over more than 5 years, when this is the case I need some kind of moving average which allows me to calculate the avg and stdev for the past 5 year. So if a company has information for 7 years I need some kind of code that will calculate the avg and stdev for, lets say, 1997 (by 1991-1996), 1998 (by 1992-1997) and 1999 (1993-1998).

As i'm not very familiar with sas commands it should look (very very roughly) like:

set var
if year = i then stdev=stdev(year(i-6) untill year(i-1)) and average = avg(year(i-6) untill year(i-1))

Or something like this, I really have no clue, I'm gonna try and figure it out but it's worth posting it if I won't find it myself.

Thanks!

4

3 回答 3

3

正确的做法是使用 PROC EXPAND。

您可以使用很多选项,但您可能想要这样做

PROC EXPAND DATA=TESTTEST OUT=MOVINGAVERAGE;
CONVERT VAL=AVG / TRANSFORMOUT=(MOVAVE 5);
RUN;

MOVSTD 也是如此。它会自动忽略缺失值,但您也可以调整该行为

于 2010-03-18T02:50:53.900 回答
1

这是一种方法。希望这可以帮助。

/* test data */
data one;
  input symbol $ value date :date9.;
  format date date9.;
cards;
ABP1 -0.025  18feb1997
ABP1  0.05   25feb1998
ABP1 -0.025  05mar1999
ABP1  0.06   20mar2000
ABP1  0.25   05mar2001
ABP1  0.455  07mar2002
ABP1  0.73   25feb2003
ABP1  1.01   19feb2004
ABP1  1.25   16feb2005
ABP1  1.65   09feb2006
ABP1  1.87   08feb2007
ABT   0.555  14jan1991
ABT   0.6375 14jan1992
ABT   0.73   16jan1993
;
run;

/* 5 year moving avg, stdev, cv assuming:
   one obs per year from 1990 to 2010.
   observations are already in the sorted order by symbol. */
%let START = 1990;
%let FINISH = 2010;

data two;
   array val[%eval(&START-3):&FINISH] val1-val3 val&START-val&FINISH;
   call missing(of val&START-val&FINISH);
   do until (last.symbol);
     set one;
     by symbol;
     year = year(date);
     if &START<=year<=&FINISH then val[year] = value;
   end;
   do year = %eval(&START+2) to &FINISH;
      avg5 = mean(val[year-5],val[year-4],val[year-3],val[year-2],val[year-1]);
      std5 =  std(val[year-5],val[year-4],val[year-3],val[year-2],val[year-1]);
      cv5  = divide(std5,avg5);
      if not missing(cv5) then output;
   end;
   keep symbol year avg5 std5 cv5;
run;

/* check */
proc print data=two;
run;
/* on lst
Obs    symbol    year      avg5       std5       cv5

  1     ABP1     1999    0.01250    0.05303    4.24264
  2     ABP1     2001    0.01500    0.04637    3.09121
  3     ABP1     2002    0.06200    0.11251    1.81461
  4     ABP1     2003    0.15800    0.19457    1.23146
  5     ABP1     2004    0.29400    0.30597    1.04071
  6     ABP1     2005    0.50100    0.37786    0.75422
  7     ABP1     2006    0.73900    0.40448    0.54734
  8     ABP1     2007    1.01900    0.46185    0.45324
  9     ABP1     2008    1.30200    0.46338    0.35590
 10     ABP1     2009    1.44500    0.38726    0.26800
 11     ABP1     2010    1.59000    0.31432    0.19769
 12     ABT      1993    0.59625    0.05834    0.09784
 13     ABT      1994    0.64083    0.08755    0.13662
 14     ABT      1995    0.64083    0.08755    0.13662
 15     ABT      1996    0.64083    0.08755    0.13662
 16     ABT      1997    0.68375    0.06541    0.09566
*/
于 2010-03-17T14:44:45.510 回答
0

为了可读性,我会在这里提倡 proc sql。以 Chang Chung 的数据为例,您可以尝试以下操作:

/* test data */
data one;
  input symbol $ value date :date9.;
  format date date9.;
cards;
ABP1 -0.025  18feb1997
ABP1  0.05   25feb1998
ABP1 -0.025  05mar1999
ABP1  0.06   20mar2000
ABP1  0.25   05mar2001
ABP1  0.455  07mar2002
ABP1  0.73   25feb2003
ABP1  1.01   19feb2004
ABP1  1.25   16feb2005
ABP1  1.65   09feb2006
ABP1  1.87   08feb2007
ABT   0.555  14jan1991
ABT   0.6375 14jan1992
ABT   0.73   16jan1993
;
run;

proc sql;
    create table two as
    select distinct
        a.symbol,
        b.value,
        year(a.date) as year,
        b.date as date5
    from
        one a,
        one b
    where
            a.symbol=b.symbol
        and intck('year',b.date,a.date) between 1 and 5
    order by
        a.symbol,
        year,
        date5;
quit;

proc sql;
    create table three as
    select distinct
        symbol,
        year,
        count(symbol) as n5,
        avg(value) as avg5,
        std(value) as std5
    from
        two
    group by
        symbol,
        year;
quit;
于 2010-03-23T08:09:42.157 回答