sql - SAS Proc SQL 获取与特定日期最近的记录

Question

我有2个表如下：

表1、用户列表表：

Year  Month Id Type 
2010  3     1  A
2010  5     2  B
2010  10    1  A
2010  12    1  A

表 2 描述了用户升级历史：

Promote Date Id
2/20/2010    1
5/20/2010    1     (4/2010 the user got demoted, and after 1 month he got promote again)

从这 2 个表中，我需要生成一个与表 1 类似的结果表，但添加一个列，该列对过去 3 个月或特定日期 3 个月以上的 A 类型用户进行分类。例如，结果将是：

Year  Month Id | Duration
2010  3     1  | A < 3 months
2010  10    1  | A > 3 months
2010  12    1  | A > 3 months

一般的想法是：

我需要将表 1 中的月份列和年份列转换为日期格式，如 3/2010
减去与上述日期（2/2010）最接近的促销日期的新转换值，以获得用户被提升的天数
比较 90 天来对他的晋升持续时间进行分类

我目前遇到了两个问题。

我不知道将月列和年列转换为月/年日期格式的最佳方法。

假设我已经从 table1 转换了月/年列，我使用 Max 函数从 table2 获取最近的日期。据我所知，max函数对性能不好，那么除了使用max之外，还有其他解决方案吗？在mysql中，使用Limit 1很容易解决，但是SAS proc-sql不支持Limit。proc-sql中是否有任何等效的限制？以下是我目前正在考虑的代码。

PROC SQL;
Create table Result as SELECT table1.Year, table1.Month, table1.Code, 
(Case When table1.Type = "B" then "B"
When table1.Type = "A" AND (table1.Date - (Select MAX(table2.Date) From table2 Where table2.Date <= table1.Date AND table2.Id = table1.Id ) < 90) THEN "A < 3 months"
When table1.Type = "A" AND (table1.Date - (Select MAX(table2.Date) From table2 Where table2.Date <= table1.Date AND table2.Id = table1.Id ) >= 90) THEN "A > 3 months"
When table1.Type = "C" then "C"
end) as NewType
From table1
LEFT JOIN
// .... 
;
QUIT;

如您所见，我需要将 table1 与其他表左连接，因此我使用子查询，这也是一个糟糕的性能，但我不知道是否有其他方法。帮助和建议表示赞赏。

score 4 · Accepted Answer

您可以从其 usingmdy()函数创建日期值，如下所示：

data have;
input Year  Month Id Type $;
datalines;
2010  3     1  A
2010  5     2  B
2010  10    1  A
2010  12    1  A
;
run;

data have;
set have;
format date date9.;
date = mdy(Month, 1, Year);
run;

您没有日期值，所以我只使用了 1（创建的每个日期都是本月的第一天）。

现在，您可以通过 ID 连接两个表，并计算第一个表中的日期和第二个表中的促销日期的差异：

proc sql;
    create table want as
    select *
          ,abs(date - promote) as diff
    from have as a
           left join
         prom as b
           on a.id = b.id;
quit;

之后，您按 id、date 和 diff 对结果表进行排序：

proc sort data=want;
by id date diff; 
run;

排序后的数据集如下所示：

Year  Month  Id  Type   date       Promote    diff
---------------------------------------------------
2010  3      1   A      01MAR2010  20FEB2010  9
2010  3      1   A      01MAR2010  20MAY2010  80
2010  5      2   B      01MAY2010  .          .
2010  10     1   A      01OCT2010  20MAY2010  134
2010  10     1   A      01OCT2010  20FEB2010  223
2010  12     1   A      01DEC2010  20MAY2010  195
2010  12     1   A      01DEC2010  20FEB2010  284

最后一步，遍历数据集并检查diff每个 ID 和日期值的第一个值是否小于或大于 3 个月（我刚刚检查了 90 天，您也可以使用intck函数）。因为我们按 id、date 和 diff 对数据集进行了排序，所以第一行应该是最接近日期的，所以你output只有第一行。

data want2(keep = year month id type duration);
set want;
by date;

if first.date and Type = 'A' then do;


if diff lt 90 then do;
    duration = 'A < 3 months';
    output want2;
end;
if diff gt 90 then do;
    duration = 'A > 3 months';
    output want2;

    end;
end;
else if first.date  then do;
    duration = type;
    output want2;
end;

run;

output使用语句是因为我们只想保留一些行（每组的第一个）。最后一个output在那里，因此类型值与 A 不同的行也保留在最终结果中。

这是最终结果：

Year    Month    Id    Type    duration
--------------------------------------------
2010    3        1     A       A < 3 months
2010    5        2     B       B
2010    10       1     A       A > 3 months
2010    12       1     A       A > 3 months

sql - SAS Proc SQL 获取与特定日期最近的记录

1 回答 1

Related

Reference