SAS SQL 不支持通过联接更新,但您可以执行相关更新:通过相关子查询中的值更新:
data P_DEF;
infile cards;
length RUN_ID_ORIG 8;
input RUN_ID ITEM_ID ITEM_TITLE $20.;
RUN_ID_ORIG = RUN_ID;
cards;
1 1 some title
1 1 should be negative
1 2 another title
1 3 should be negative
4 44 another title
5 44 should be negative
;
run;
data TMP;
infile cards;
input RUN_ID ITEM_ID ITEM_TITLE $20. @30 NEW_ID;
cards;
1 1 should be negative 100
1 3 should be negative 123
5 44 should be negative 188
;
run;
proc sql;
/* this unwillingly updates all records, nonmatched will be set to null */
update P_DEF
set RUN_ID = (select NEW_ID from TMP
where P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
;
select * from P_DEF
;
quit;
当存在不匹配时,相关更新是不够的,因此您需要添加过滤器以仅更新匹配的行。在加入多个列时,我通常依靠 catx 来获取唯一值(根据您的数据,您可能需要在 put 函数中使用不同的数字格式):
proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;
/* correct "inner join" update */
proc sql;
update P_DEF
set RUN_ID = (select NEW_ID from TMP
where P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
where
catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP
;
select * from P_DEF;
quit;
上面的版本与您的确切示例略有不同,以显示如何从子查询中获取值 - NEW_ID 列。
您只使用查找表来识别要更新的行的简化版本是这样的:
proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;
proc sql;
/* simplified for your case:
you dont actually use value from TMP that does not exist in P_DEF */
update P_DEF
set RUN_ID = -1 * RUN_ID
where
RUN_ID > 0 /* so we can rerun this if needed */
and catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in ( select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP )
;
select * from P_DEF;
quit;
如您所见,相关更新可能需要两个子查询来更新单个列,因此不要指望它在更大的表上表现出色。使用数据步骤方法可能会更好:MERGE、MODIFY 或 UPDATE 语句。
至于您要求的 SAS Data Integration Studio 转换,我相信您可以使用 SCD Type 1 Loader 实现这一点,这将生成我提到的一些代码。