0

假设我们有一个表P_DEF,我们要在其中更新RUN_ID我们存储在另一个表中的某个子集的列值TMP。在这里我将如何在 SQL 中执行此操作:

update P_DEF
set RUN_ID = (-1) * TMP.RUN_ID /* change the sign of the value */
from P_DEF
inner join TMP
on P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE

现在有个大问题:据我所知, aproc SQL不支持这种过滤更新。那么如何在 SAS DI(S)中进行最少的转换呢?

4

1 回答 1

1

SAS SQL 不支持通过联接更新,但您可以执行相关更新:通过相关子查询中的值更新:

data P_DEF;
infile cards;
length RUN_ID_ORIG 8;
input RUN_ID ITEM_ID ITEM_TITLE $20.;
RUN_ID_ORIG = RUN_ID;
cards;
1 1 some title
1 1 should be negative
1 2 another title
1 3 should be negative
4 44 another title
5 44 should be negative
;
run;

data TMP;
infile cards;
input RUN_ID ITEM_ID ITEM_TITLE $20. @30 NEW_ID;
cards;
1 1 should be negative       100
1 3 should be negative       123
5 44 should be negative      188
;
run;

proc sql;
/* this unwillingly updates all records, nonmatched will be set to null */
update P_DEF
set RUN_ID = (select NEW_ID from TMP
            where P_DEF.RUN_ID = TMP.RUN_ID
            and P_DEF.ITEM_ID = TMP.ITEM_ID
            and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
;
select * from P_DEF
;
quit;

当存在不匹配时,相关更新是不够的,因此您需要添加过滤器以仅更新匹配的行。在加入多个列时,我通常依靠 catx 来获取唯一值(根据您的数据,您可能需要在 put 函数中使用不同的数字格式):

proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;


/* correct "inner join" update */
proc sql;
update P_DEF
set RUN_ID = (select NEW_ID from TMP
            where P_DEF.RUN_ID = TMP.RUN_ID
            and P_DEF.ITEM_ID = TMP.ITEM_ID
            and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
where
          catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP
;
select * from P_DEF;
quit;

上面的版本与您的确切示例略有不同,以显示如何从子查询中获取值 - NEW_ID 列。

您只使用查找表来识别要更新的行的简化版本是这样的:

proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;

proc sql;
/* simplified for your case:
you dont actually use value from TMP that does not exist in P_DEF */
update P_DEF
set RUN_ID = -1 * RUN_ID
where
   RUN_ID > 0 /* so we can rerun this if needed */
   and      catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in ( select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP )
;
select * from P_DEF;
quit;

如您所见,相关更新可能需要两个子查询来更新单个列,因此不要指望它在更大的表上表现出色。使用数据步骤方法可能会更好:MERGE、MODIFY 或 UPDATE 语句。

至于您要求的 SAS Data Integration Studio 转换,我相信您可以使用 SCD Type 1 Loader 实现这一点,这将生成我提到的一些代码。

于 2021-10-23T10:54:01.340 回答