1

So I have a dataset_a that looks like this:

Name  Month
Dick  Aug
Dick  Sep
Dick  Oct
Jane  Aug
Jane  Sep
...

And some other, much larger dataset_b like this:

Name  Day        X     Y
Dick  12-Jul-13  14.8  2.3
Jane  05-Sep-13  12.2  2.0
Dick  02-Aug-13  15.1  3.2
Dick  07-Aug-13  14.5  3.0
Jane  05-Aug-13  12.8  2.5
Dick  08-Aug-13  14.5  3.0
Dick  10-Aug-13  13.5  2.3
Jane  31-Jul-13  13.0  2.2
...

I want to iterate over it, and for each row in dataset_a, do a data step that gets the appropriate records from dataset_b and puts them in a temp dataset--temp, let's call it. Then I need to do a proc reg on temp and stick the results (row-vector-style) back into dataset_a, like so:

Name  Month Parameter-est.-for-Y p-value  R-squared
Dick  Aug   Some #               Some #   Some #
Dick  Sep   Some #               Some #   Some #
Dick  Oct   Some #               Some #   Some #
Jane  Aug   Some #               Some #   Some #
Jane  Sep   Some #               Some #   Some #
...

Here's some code/pseudocode to illustrate my need:

for each row in dataset_a
    data temp;
    set dataset_b; where name=['i'th name] and month(day)=['i'th month]; 
    run;
    proc reg /*noprint*/ alpha=0.1 outest=[?] tableout; model X = Y; run;
    /*somehow put these regression results back into 'i'th row of dataset_a*/
next

Please post a comment if something doesn't make sense. Thanks very much in advance!

4

1 回答 1

4

有效的方法与您列出的方法有些不同。在您展示的特定实例中,最有效的方法是使用格式将 Day 值分组为 Months,并运行您的回归by name day,假设回归尊重格式(如果不是,则创建一个新变量month并使用该格式分配) .

例如:

data for_reg/view=for_reg;
set dataset_b;
month=put(day,MONNAME3.);
run;

或者

proc datasets lib=work;
modify dataset_b;
format day MONNAME3.;
quit;

然后

proc reg data=for_reg;
by name month; *or if using the other one, by name day;
**other proc reg statements**;
run;

dataset_a然后根据需要将该输出数据集与。它将运行 proc reg,就像您为每个名称/月份组合运行一次一样,但一次调用一次,一次传递数据。


如果PROC REG不尊重团体(我认为确实如此,但谁知道),最好的解决方案仍然是做这样的事情;编写一个宏来运行 proc reg 以 and 为参数,namemonthdataset_a. 然后生成通用输出文件(或proc append将它们放入宏中的单个主输出数据集)并dataset_a在最后根据需要合并结果。

就像是

%macro run_procreg(name=,month=);
data for_run/view=for_run;
set dataset_b;
where name=&name. and put(day,MONNAME3.)=&month.;
run;

proc reg data=for_run; 
*other stuff*;
output out=tempdataset; *or however you create your output;
run;

proc append base=master_output data=tempdataset force;
run;
%mend run_procreg;

proc sql;
select cats('%run_procreg(name=',name,',month=',month,')') into :macrocalllist
  separated by ' ' from dataset_a;
quit;

&macrocalllist;

data fin;
merge dataset_a (in=a) master_output(in=b);
by name month;
run;

dataset_a如果它只有这两个变量,你可能不需要最后合并。这将比使用 调用慢很多by,但如果有必要,这就是这样做的方法。

您还可以call execute在 datastep 中使用来驱动像上面一样的宏列表 - 这几乎是与您声明的伪代码最相似的概念,它几乎相同 - 但它不会将信息返回给 data 步骤(它在 data 步骤之后执行完成),比上面的方法稍微麻烦一些。还有,在 9.3+dosubl中,FCMP 语言允许你做更接近你想要的东西,但我不太了解它来解释或知道它确实满足你的需求。

于 2013-10-08T21:38:20.983 回答