0

I'm trying to efficiently implement a block bootstrap technique to get the distribution of regression coefficients from PROC MIXED. The main outline is as follows:

I have a panel data set, say firm and year are the indices. For each iteration of the bootstrap, I wish to sample with replacement n subjects. From this sample, I need to construct a new data set that is a "stack" (concatenated row on top of row) of all the observations for each sampled subject. With this new data set, I can run the regression and pull out the coefficients of interest. Repeat for a bunch of iterations, say 2000.

Each firm can potentially be selected multiple times, so I need to include its data multiple times in each iteration's data set. Using a loop and subset approach, seems computationally burdensome. My real data set quite large (a 2Gb .sas7bdat file).

Example pseudo/explanatory code (please pardon all noob errors!):

DATA subjectlist;
  SET mydata;
  BY firm;
  IF first.firm;
RUN;

%macro blockboot(input=, subjects=, iterations=);

%let numberfirms = LENGTH(&subjects);

  %do i = 1 %to &iterations ;
    DATA mytempdat;
      DO i=1 TO &numberfirms;
        rec = ceil(&numberfirms * ranuni(0));

        *** This is where I want to include all observations for the randomly selected subjects;
        *** However, this code doesn't include the same subject multiple times, which...;
        *** ...is what I want;
        SET &INPUT subjects IN &subjects;

      OUTPUT;
      END;
     STOP;

  PROC MIXED DATA=mytempdat; 
    CLASS firm year; 
    MODEL yval= cov1 cov2; 
    RANDOM intercept /sub=subject type=un; 
    OUTPUT out=outx cov1=cov1 ***want to output the coefficient estimate on cov1 here;
  RUN; 

    %IF &i = 1 %THEN %DO;
      DATA outall;
        SET outx;
      %END;
    %ELSE %DO;
      PROC APPEND base=outall data=outx;
      %END;
    %END;  /* i=1 to &REPS loop */

  PROC UNIVARIATE data=outall;
    VAR cov1;
    OUTPUT out=final pctlpts=2.5, 97.5 pctlpre=ci;

%mend;

%blockboot(input=mydata,subjects=subjectlist, reps=2000)

This question is identical to a question I asked previously, found here:

block bootstrap from subject list

Any help is appreciated!

4

1 回答 1

1

有关在 SAS 中执行此操作的最佳方法的详细信息,请参阅以下论文:

http://www2.sas.com/proceedings/forum2007/183-2007.pdf

总的总结是使用 PROC SURVEYSELECT 和一种允许带替换抽样的方法来创建您的引导样本,然后使用带有 PROC MIXED 的 BY 处理来只运行一次 PROC,而不是运行 2000 次。

于 2012-11-21T21:08:07.820 回答