I'm trying to efficiently implement a block bootstrap technique to get the distribution of regression coefficients from PROC MIXED
. The main outline is as follows:
I have a panel data set, say firm
and year
are the indices. For each iteration of the bootstrap, I wish to sample with replacement n subjects. From this sample, I need to construct a new data set that is a "stack" (concatenated row on top of row) of all the observations for each sampled subject. With this new data set, I can run the regression and pull out the coefficients of interest. Repeat for a bunch of iterations, say 2000.
Each firm can potentially be selected multiple times, so I need to include its data multiple times in each iteration's data set. Using a loop and subset approach, seems computationally burdensome. My real data set quite large (a 2Gb .sas7bdat file).
Example pseudo/explanatory code (please pardon all noob errors!):
DATA subjectlist;
SET mydata;
BY firm;
IF first.firm;
RUN;
%macro blockboot(input=, subjects=, iterations=);
%let numberfirms = LENGTH(&subjects);
%do i = 1 %to &iterations ;
DATA mytempdat;
DO i=1 TO &numberfirms;
rec = ceil(&numberfirms * ranuni(0));
*** This is where I want to include all observations for the randomly selected subjects;
*** However, this code doesn't include the same subject multiple times, which...;
*** ...is what I want;
SET &INPUT subjects IN &subjects;
OUTPUT;
END;
STOP;
PROC MIXED DATA=mytempdat;
CLASS firm year;
MODEL yval= cov1 cov2;
RANDOM intercept /sub=subject type=un;
OUTPUT out=outx cov1=cov1 ***want to output the coefficient estimate on cov1 here;
RUN;
%IF &i = 1 %THEN %DO;
DATA outall;
SET outx;
%END;
%ELSE %DO;
PROC APPEND base=outall data=outx;
%END;
%END; /* i=1 to &REPS loop */
PROC UNIVARIATE data=outall;
VAR cov1;
OUTPUT out=final pctlpts=2.5, 97.5 pctlpre=ci;
%mend;
%blockboot(input=mydata,subjects=subjectlist, reps=2000)
This question is identical to a question I asked previously, found here:
block bootstrap from subject list
Any help is appreciated!