1

I need to sum variables in two data sets and join them. I would like to do this in one SQL statement, however it is a one-to-many join. I am interested to learn if a summary variable can be created, for lack of a better description, using a SELECT statement.

The below code incorrectly calculates the summary variable for HOURS since there is only 1 record per name/date in INTERVAL, but multiple records per name/date in DETAIL.

I certainly could write multiple steps to accomplish this, but wanted to see if it can be accomplished in one SQL step. Thanks

Sample Code:

data Detail;
 Length Name CallType $25;
 input date mmddyy10. name $ calltype $ count;
 Format date mmddyy10.;
 datalines;
05/01/2014 John Order 5
05/01/2014 John Complaint 6
05/01/2014 Mary Order 7
05/01/2014 Mary Complaint 8
05/01/2014 Joe Order 4
05/01/2014 Joe Complaint 2
05/01/2014 Joe Internal 2
05/02/2014 John Order 6
05/02/2014 John Complaint 4
05/02/2014 Mary Order 9
05/02/2014 Mary Complaint 7
05/02/2014 Joe Order 3
05/02/2014 Joe Complaint 1
05/02/2014 Joe Internal 3
;

data Interval;
 Length Name $25;
 input date mmddyy10. name $ hours;
 Format date mmddyy10.;
 datalines;
05/01/2014 John 8
05/01/2014 Mary 6
05/01/2014 Joe 4
05/02/2014 John 8
05/02/2014 Mary 6
05/02/2014 Joe 4
;

PROC SQL noprint feedback;
 CREATE TABLE SUMMARY AS
 SELECT
  D.Name
  , Sum(D.Count) as Count
  , Sum(I.Hours) as Hours
 FROM Detail D, Interval I
 WHERE D.Name=I.Name and D.Date=I.Date
 GROUP BY D.Name
 ORDER BY D.Name;
QUIT;
4

2 回答 2

2

Robert 的解决方案运行良好,但是在将子查询移动到 from 子句而不是在 select 中使用它们时,我得到了更好的性能。当在 from 中使用时,两个查询都只执行一次并连接结果,而 select 中的子查询将为每一行执行一次。

    proc sql;
     create table summary as
     select
      d.name,
      count,
      hours
     from
      (select name, sum(count) as count from detail group by name) d inner join 
      (select name, sum(hours) as hours from interval group by name) i
      on d.name = i.name
     order by d.name
    ;
    quit;
于 2014-06-12T09:23:29.080 回答
2

这很有效,应该不会太低效。我个人认为最好的方法是在合并之前独立地总结两者:

PROC SQL noprint feedback;
 CREATE TABLE SUMMARY AS
 SELECT
  D.Name
  , Sum(D.Count) as Count
  , (SELECT sum(I.Hours) as Hours from Interval I WHERE D.Name=I.Name GROUP BY i.name) as Hours
 FROM Detail D
 GROUP BY D.Name
 ORDER BY D.Name
 ;
QUIT;
于 2014-06-10T15:55:05.347 回答