目前我有一个程序在 SAS 中处理原始数据,运行如下查询:
/*this code joins the details onto the spine, selecting the details
that have the lowest value2 that is greater than value, or if there are none
of those, then the highest value2*/
/*our raw data*/
data spine;
input id value;
datalines;
1 5
2 10
3 6
;
run;
data details;
input id value2 detail $;
datalines;
1 6 foo
1 8 bar
1 4 foobar
2 8 foofoo
2 4 barbar
3 6 barfoo
3 2 foobarfoo
;
run;
/*cartesian join details onto spine, split into highs and lows*/
proc sort data = spine;
by id;
run;
proc sort data= details;
by id;
run;
data lows highs;
join spine details;
by id;
if value2 ge value then output highs;
else output lows;
run;
/*grab just the first/last of each set*/
proc sort data =lows;
by id value2;
run;
proc sort data = highs;
by id value2;
run;
data lows_lasts;
set lows;
by id;
if last.id;
run;
data highs_firsts;
set highs;
by id;
if first.id;
run;
/*join the high value where you can*/
data join_highs;
merge spine(in = a)
highs_firsts ;
by id;
if a;
run;
/*split into missing and not missng*/
data not_missing still_missing;
set join_highs;
if missing(value2) then output still_missing;
else output not_missing;
run;
/*if it doesn't already have a detail, then attach a low*/
data join_lows;
merge still_missing(in = a)
lows_lasts ;
by id;
if a;
run;
/*append the record that had a high joined, and the record that had a low joined, together*/
data results;
set not_missing join_lows;
run;
你得到图片。有很多这样的数据处理报表,每周都会在新记录上运行。
还进行数据转换(例如清理/解析地址)。
现在 - 这种处理可以使用 SQL 来完成。
问题是——这是否适合使用关系数据库?还是应该仅将数据库用于数据存储和检索?
考虑到我们正在讨论具有多达 1000 万行的表。