1

我正在研究一个纵向数据集,其中每一行是一个主题,每一列是一个事件。对象可以拥有的事件数量没有限制,但事件以几种方式编码。为了这个例子,假设一种编码方式是二进制(好,坏)。

我试图找到 1) 所有由 3 个或更多事件(没有计数限制)组成的事件字符串,这些事件从开始到结束都在 24 小时内(在同一主题上)。在同一主题中,此标准也可能多次成功。

2)对于每个成功(24 小时内 3 个或更多事件的字符串)我需要计算好的事件的数量。

我已经包含了生成与我相似的数据的代码。现在我正在简化为 26 个观察结果,但对于单个主题我最多有 42 个。

  data examp;
informat subject 4. epdt1   epdt2   epdt3   epdt4   epdt5   epdt6   epdt7   epdt8   epdt9   epdt10  epdt11  epdt12  epdt13  epdt14  epdt15  epdt16  epdt17  epdt18  epdt19  epdt20  epdt21  epdt22  epdt23  epdt24  epdt25  epdt26 datetime20.
    good1   good2   good3   good4   good5   good6   good7   good8   good9   good10  good11  good12  good13  good14  good15  good16  good17  good18  good19  good20  good21  good22  good23  good24  good25  good26 1.;
input subject   epdt1   epdt2   epdt3   epdt4   epdt5   epdt6   epdt7   epdt8   epdt9   epdt10  epdt11  epdt12  epdt13  epdt14  epdt15  epdt16  epdt17  epdt18  epdt19  epdt20  epdt21  epdt22  epdt23  epdt24  epdt25  epdt26
            good1   good2   good3   good4   good5   good6   good7   good8   good9   good10  good11  good12  good13  good14  good15  good16  good17  good18  good19  good20  good21  good22  good23  good24  good25  good26;
format subject: 4. epdt: datetime20. good: 1.;
datalines;
3098    .   .   25JUL1998:01:46:27  25JUL1998:02:16:05  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
3021    13JAN1999:17:31:37  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
1982    01FEB1998:02:29:01  12APR1999:19:49:00  03JUN2018:21:00:00  13AUG1999:13:39:00  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   1   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
1093    11APR2015:16:10:57  30AUG2015:00:52:28  14SEP2015:08:24:25  09MAY1999:00:28:37  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
4089    29JUN1998:05:18:34  23JUL1998:18:31:11  07FEB1999:05:25:45  07FEB1999:05:29:26  07FEB1999:05:32:04  07FEB1999:05:34:05  14FEB1999:18:00:13  14FEB1999:18:01:02  14FEB1999:18:03:24  14FEB1999:18:05:55  14FEB1999:18:16:45  14FEB1999:18:19:04  14FEB1999:18:31:57  14FEB1999:18:35:22  28JUL1998:18:32:02  31DEC1998:00:22:33  .   .   .   .   .   .   .   .   1   1   1   1   1   1   1   1   1   1   1   .   1   .   1   .   .   .   .   .   .   .   .   .   .
3055    18FEB1998:11:34:00  14JUL1998:01:20:34  13OCT1998:10:49:08  30OCT1998:18:14:58  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
1239    07MAR1998:06:02:18  01JUN1998:08:18:20  23JUN1998:07:52:11  04JUL1998:08:47:04  29JUL1998:23:16:41  29JUL1998:23:30:03  29JUL1998:23:42:56  30JUL1998:00:08:03  30JUL1998:00:12:30  30JUL1998:00:14:58  30JUL1998:00:36:00  30JUL1998:00:38:33  30JUL1998:00:57:56  30JUL1998:01:01:03  30JUL1998:01:06:10  30JUL1998:01:16:50  30JUL1998:01:24:19  30JUL1998:01:32:30  30JUL1998:01:42:55  30JUL1998:01:50:24  30JUL1998:02:08:46  30JUL1998:02:20:18  30JUL1998:02:22:08  30JUL1998:02:28:52  30JUL1998:02:31:29  30JUL1998:02:51:29  .   .   1   .   1   1   1   1   1   1   1   .   1   1   1   1   1   1   1   1   1   1   1   1   .   1
9834    10JUL1999:20:22:24  14JUL1999:00:52:02  14JUL1999:17:02:38  14JUL1999:17:30:06  21FEB2000:12:41:34  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .
run;
proc sort data=examp; by subject;


data epwide_dt1;
format  apppair $7000.;
set examp;

by subject;
%macro loops;
array eptm (*)epdt1-epdt26;    array apptm (*)  good1-good26;
*********using the starting value for identifying pairs;
*******trimmed then for the sake of making the macro work;
%do start=1 %to 26;
    %do stop=3 %to 26;  
%if &start.<&stop. %then %do ;
/***********to figure out if the difference between the pairs of times are 24 hours;*/
tbtw=eptm[&stop.]-eptm[&start.];
/*  *********number of points between them;*/
diff=(&stop.)- (&start.);
*******calculate the summaries between all episodes from start to stop;
array appr&start.&stop. (*) ap&start.-ap&stop.;
array stmct&start.&stop.(*) st&start.-st&stop.;
    %do i=&start. %to &stop.;
******calculate the number of appropriate episodes;
    if apptm[&i] ne . then appr&start.&stop.[&i]=apptm[&i];
    else appr&start.&stop.[&i]=0;
totapp=sum(of appr&start.&stop.(*));
if totapp=. then totapp=0;

****after you calculate the total value dump the array before the next itteration;
/*call missing(of appr&start.&stop.{*});*/

if (eptm[&start.] ne . and eptm[&stop.] ne . and diff>=2 and .<tbtw<86400 and totapp>1 ) then do;
appPair=catx(" ",apppair,"(",strip(put(&start., 3.)),"-",strip(put(&stop.,3.)),":", strip(put(totapp,3.)),"Good)");
end;


%end;
%end;
%end;
%end;
%mend;
%loops ;
run;

下面的错误消息是结果:

ERROR: Array subscript out of range at line 1 column 2.
apppair=  subject=1093 epdt1=11APR2015:16:10:57 epdt2=30AUG2015:00:52:28 epdt3=14SEP2015:08:24:25
epdt4=09MAY1999:00:28:37 epdt5=. epdt6=. epdt7=. epdt8=. epdt9=. epdt10=. epdt11=. epdt12=. epdt13=. epdt14=. epdt15=.
epdt16=. epdt17=. epdt18=. epdt19=. epdt20=. epdt21=. epdt22=. epdt23=. epdt24=. epdt25=. epdt26=. good1=. good2=.
good3=. good4=. good5=. good6=. good7=. good8=. good9=. good10=. good11=. good12=. good13=. good14=. good15=. good16=.
good17=. good18=. good19=. good20=. good21=. good22=. good23=. good24=. good25=. good26=. FIRST.subject=1
LAST.subject=1 tbtw=1323117 diff=1 ap1=0 ap2=0 ap3=0 st1=. st2=. st3=. totapp=0 ap4=0 st4=. ap5=0 st5=. ap6=0 st6=.
ap7=0 st7=. ap8=0 st8=. ap9=0 st9=. ap10=0 st10=. ap11=0 st11=. ap12=0 st12=. ap13=0 st13=. ap14=0 st14=. ap15=0
st15=. ap16=0 st16=. ap17=0 st17=. ap18=0 st18=. ap19=0 st19=. ap20=0 st20=. ap21=0 st21=. ap22=0 st22=. ap23=0 st23=.
ap24=0 st24=. ap25=0 st25=. ap26=0 st26=. _ERROR_=1 _N_=1
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 35:20     1 at 57:20     1 at 83:20     1 at 113:20    1 at 147:20    1 at 185:20    1 at 227:20
      1 at 273:20    1 at 323:20    1 at 377:20    1 at 435:20    1 at 497:20    1 at 563:20    1 at 633:20
      1 at 707:20    1 at 785:20    1 at 867:20    1 at 953:20    1 at 1043:20   1 at 1137:20   1 at 1235:20
      1 at 1337:20
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set WORK.EXAMP.
WARNING: The data set WORK.EPWIDE_DT1 may be incomplete.  When this step was stopped there were 0 observations and
         109 variables.
WARNING: Data set WORK.EPWIDE_DT1 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           2.35 seconds
      cpu time            2.13 seconds

在此先感谢您的任何建议!

4

2 回答 2

0

我不确定我是否完全理解您的全部问题。但是请考虑,如果您想对数组中从索引 START 到索引 STOP 的值的子集求和,则只需使用 DO 循环。

例如,要将 X10 与 X20 相加,您可以使用如下代码:

array x (100) ;
start=10;
stop=20;
do i=start to stop;
   total=sum(total,0,x(i));
end;

所以你应该能够在没有宏代码的情况下解决这个问题。这应该使调试更容易。

于 2018-10-17T18:45:26.347 回答
0

我终于搞定了!!!我使用@Tom 的建议来消除为每对创建子数组的需要,因为它会导致很多问题。我还简化了输出,并要求它输出每一对“好”的,以便我能够更轻松地评估它们。以前它正在创建 appPair(我对开始停止循环中的每次迭代的评估摘要都会产生一堆无关的输出)。

data epwide_dt1;
set examp;

by subject;
if first.subject then totapp=0;
%macro loops;
array eptm (*)epdt1-epdt26;
array apptm (*)  good1-good26;
*********using the starting value for identifying pairs;
%do start=1 %to 24;
    %do stop=3 %to 26;  
%if &start.<&stop. %then %do ;
totapp=0;
/***********to figure out if the difference between the pairs of times are 24 hours;*/
tbtw=eptm[&stop.]-eptm[&start.];
/*  *********number of points between them;*/
diff=(&stop.)- (&start.);

    %do i=&start. %to &stop.;
******calculate the number of good events;
totapp=sum(totapp, 0,apptm[&i]);

***output the summary on the pair that can be evaluated in the next step;
if &i=&stop. and (eptm[&start.] ne . and eptm[&stop.] ne . and diff>=2 and 0<tbtw<86400 and totapp>1 ) then do;
appPair=catx(" ","(",strip(put(&start., 3.)),"-",strip(put(&stop.,3.)),":", strip(put(totapp,3.)),"Good)");
   output;
    end;


%end;
%end;
%end;
%end;
%mend;
%loops ;
run;
于 2018-10-19T02:53:16.477 回答