sas - 将 Proc Sql 存在查询转换为数据步骤

Question

我当前的代码中有这个 proc sql 查询。不幸的是，我正在处理超过 1000 万条记录，因此需要数小时才能运行。我一直在尝试将其转换为数据步骤，认为它会运行得更快。但是，我似乎无法获得相同的数据结果。如果有人可以帮助我完成数据步骤，我将不胜感激。或者，如果您对如何使 proc sql 更有效地运行有任何建议。

这是我的 proc sql 查询：

proc sql;
  create table test as
  select *
  from table1 a
  where exists (select 1
                from table2 b
                where b.acct_id = a.acct_id);
quit;

这是我尝试将其转换为的数据步骤：

proc sort data=table1; by acct_id; run;
proc sort data=table2; by acct_id; run;

data test;
  merge table1   (in=a)
        table2   (in=b);
  by acct_id;
  if a and b;
run;

score 0 · Accepted Answer

在 SQL 中尝试内连接。您必须列出需要匹配的每个变量。

create table test as
select *
    from
        table1 as a
      inner join
        table2 as b
      on a.acct_id = b.acct_id
      and a.var1 = b.var2 
        ....
      ;

这应该避免我怀疑您花费时间的内部选择。

如果这太慢，那么考虑在两个表中的 acct_id 上放置一个索引。那应该加快加入速度。

score 0 · Accepted Answer

至于为什么您当前的数据步骤不起作用，很可能是因为您在 table2 上有重复的键（这会扭曲存在 1-N 或 NN 合并的观察结果）。如果您修改排序以仅保留键并删除重复项，则合并应该给出预期的结果。

proc sort data=table1; by acct_id; run;
proc sort data=table2 (keep=acct_id) out=wanted_accounts nodupkey; by acct_id; run;

data test;
merge table1 (in=a)
      wanted_accounts (in=b);
by acct_id;
if a and b;
run;

sas - 将 Proc Sql 存在查询转换为数据步骤

2 回答 2

Related

Reference