0

数据集:

在此处输入图像描述

email   brand   startdate   response_no     need logic for conditions to select/filter response
abc     wi      4/1/2019     1              (select)
abc     wi      9/4/2019     2           (compare with 1st),since less than 6 month, filter out)
abc     wi      11/22/2019   3           (compare with 1st), more than 6 month, select)
xyz     wi      3/2/2019     1           (select)
xyz     wi      10/23/2019   2           (compare with 1st , more than 6 month , select)
xyz     wi      11/27/2019   3           (compare with 2nd, less than 6 month , filter out)
xyz     msw     2/21/2019    1            (select)
xyz     msw     2/20/2020    2           (compare with 1st , more than 6 month , select)

根据上述数据,我需要为每个电子邮件和品牌编写一个逻辑,以过滤掉从之前选择的响应开始日期起 6 个月内的响应号。例如。对于电子邮件 abc 和品牌 wi,我在 2019 年 4 月 1 日收到第 1 次回复(第 1 次回复),第 2 次回复是在 2019 年 9 月 24 日,(从 1 日起 5 个月),所以我需要过滤掉它,下一个 3 日回复是 11/22/2019 ,(距离第一次回复超过 6 个月)所以不要过滤掉。如果第二个响应比第一个晚了 6 个月,我需要避免过滤它,然后第三个响应必须与第二个而不是第一个进行比较基本上,过滤的检查条件应该在当前响应日期和之前的响应日期之间未过滤掉每个品牌的每封电子邮件

4

1 回答 1

0

我相信您期待以下输出是吗?

在此处输入图像描述

所以从 SQL 的角度来看,我做了以下事情(也可以通过其他方式解决):

Create Table DB_NM.SCHEMA_NM.TEST (
  email varchar(255),
  brand varchar(4),  
  startdate date, 
  response_no numeric(10,0)
); 

Insert Into DB_NM.SCHEMA_NM.TEST VALUES('abc', 'wi', '2019-04-01', 1);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('abc', 'wi', '2019-09-04', 2);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('abc', 'wi', '2019-11-22', 3);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('xyz', 'wi', '2019-03-02', 1);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('xyz', 'wi', '2019-10-23', 2);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('xyz', 'wi', '2019-11-27', 3);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('xyz', 'msw', '2019-02-21', 1);          
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('xyz', 'msw', '2020-02-20', 2);

SQL:

Select SRC.EMAIL, 
       SRC.BRAND,
       SRC.STARTDATE
From 
(
Select EMAIL, 
       BRAND,
       STARTDATE,
       lag(STARTDATE) over (partition by EMAIL, BRAND order by STARTDATE) as PREV_DATE,
       Case When PREV_DATE Is Null Then -1 Else (-1*(datediff(month,STARTDATE,PREV_DATE))) End as DATE_DIFF_MTH, 
       Case When PREV_DATE Is Null Then '1949-01-01'
            Else Case When (-1*(datediff(month,STARTDATE,PREV_DATE))) < 6 
                      Then lead(STARTDATE) over (partition by EMAIL, BRAND order by STARTDATE)
                      Else STARTDATE End End as DATE_TO_CONSIDER
From DB_NM.SCHEMA_NM.TEST
Order By 1,2  
) P
Inner Join DB_NM.SCHEMA_NM.TEST SRC ON SRC.STARTDATE = P.DATE_TO_CONSIDER
Where P.PREV_DATE IS NOT NULL
Order BY 1,2

正如我所说,还有其他方法可以解决这个问题。希望这可以帮助!

谢谢

于 2020-06-17T17:44:54.397 回答