0

Possible Duplicate:
Trying to consolidate employer records who are continuously work for same department

I am trying to consolidate employees records who have been continuously (anything < 45 days) enrolled with the specific department

Note: If the date diff (between emp_eff_to_date and next row emp_eff_from_date) is less than 45 days then it is considered as continuous

INPUT:

EMP_ID +      DEPT_ID +        EMP_EFF_FROM_DATE +      EMP_EFF_TO_DATE
-----------------------------------------------------------------------
10       10001       8/1/2008              10/31/2009
10       10001       11/1/2009              2/25/2010
10       10001       2/26/2010              5/1/2011
10       10001       8/1/2011              10/30/2011
10       10001       12/1/2011             10/31/2012
10       10003       7/1/2007              10/31/2007
10       10004       9/27/2004              6/8/2006
10       10004       6/30/2006              6/29/2007
10       10007       6/25/2006              6/20/2007
10       10007       8/25/2007              5/25/2008

Output desired:

EMP_ID         DEPT_ID      EMP_EFF_FROM_DATE     EMP_EFF_TO_DATE
-------------------------------------------------------------------------
10          10001        2008-08-01             2011-05-01
10          10001        2011-08-01             2012-10-31
10          10003        2007-07-01             2007-10-31
10          10004        2004-09-27             2007-06-29
10          10007        2006-06-25             2007-06-20
10          10007        2007-08-25             2007-06-29
4

2 回答 2

2

我最近不得不做一件非常相似的事情,我的第一个想法是递归表表达式,它有效,但可能不是最佳解决方案,具体取决于表中的数据量。

目前尚不清楚您是要实际从数据库中删除行,还是仅根据当前记录按需要查看结果。

解决方案 1SQL 小提琴

这使用 CTE 来选择结果。它基本上会找到起始日期在当前行到日期的 45 天内的下一行,并继续循环直到没有匹配项。完成后,它会查找每个起始日期(MaxRecursion 字段)的最新结果的结果,然后排除该行日期范围内的所有其他行。

WITH CTE AS
(   SELECT  *, [Recursion] = 0
    FROM    T
    UNION ALL
    SELECT  T.EMP_ID,
            T.DEPT_ID,
            T.EMP_EFF_FROM_DATE,
            T2.EMP_EFF_TO_DATE,
            T.[Recursion] + 1
    FROM    CTE T
            INNER JOIN T T2
                ON T.EMP_ID = T.EMP_ID
                AND T.DEPT_ID = T2.DEPT_ID
                AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_FROM_DATE
                AND T2.EMP_EFF_TO_DATE > T.EMP_EFF_TO_DATE
                AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
), CTE2 AS
(   SELECT  *, 
            [MaxRecursion] = MAX(Recursion) OVER(PARTITION BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE)
    FROM    CTE
)
SELECT  T.EMP_ID, 
        T.DEPT_ID, 
        T.EMP_EFF_FROM_DATE, 
        T.EMP_EFF_TO_DATE
FROM    CTE2 T
WHERE   Recursion = MaxRecursion
AND     NOT EXISTS
        (   SELECT  1
            FROM    CTE2 T2
            WHERE   T.EMP_ID = T2.EMP_ID
            AND     T.DEPT_ID = T2.DEPT_ID
            AND     T.EMP_EFF_FROM_DATE < T2.EMP_EFF_FROM_DATE
            AND     T.EMP_EFF_TO_DATE >= T2.EMP_EFF_TO_DATE
        )
ORDER BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE, EMP_EFF_TO_DATE;

解决方案 2SQL 小提琴

这实际上会更新现有行并删除冗余行,这意味着您只需从表中选择即可获得所需的结果。如果您当然不想从数据库中实际删除,您可以将数据插入临时表并应用相同的原则(此处为示例)。在我的情况下,这个解决方案比使用递归 CTE 运行得快得多,因为在循环的每个阶段,查询处理的数据更少,而不是像递归 cte 那样处理更多。

WHILE EXISTS
    (   SELECT  1
        FROM    T
                INNER JOIN T T2
                    ON T2.EMP_ID = T.EMP_ID
                    AND T2.DEPT_ID = T.DEPT_ID
                    AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_TO_DATE 
                    AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
    )
    BEGIN
        UPDATE  T
        SET     EMP_EFF_TO_DATE = T2.EMP_EFF_TO_DATE
        FROM    T
                INNER JOIN 
                (   SELECT  *
                    FROM    T 
                ) T2
                    ON T2.EMP_ID = T.EMP_ID
                    AND T2.DEPT_ID = T.DEPT_ID
                    AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_TO_DATE 
                    AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)

        DELETE  T
        FROM    T
        WHERE   EXISTS
                (   SELECT  1
                    FROM    T T2
                    WHERE   T2.EMP_ID = T.EMP_ID
                    AND     T2.DEPT_ID = T.DEPT_ID
                    AND     T2.EMP_EFF_FROM_DATE < T.EMP_EFF_FROM_DATE
                    AND     T2.EMP_EFF_TO_DATE BETWEEN T.EMP_EFF_FROM_DATE AND T.EMP_EFF_TO_DATE
                )
    END;

SELECT  *
FROM    T
ORDER BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE;

所有这些解决方案都与最后一行中的示例数据不同,这似乎是一个错误:

我认为这一行:

10          10007        2007-08-25             2007-06-29

应该:

10          10007        2007-08-25             2008-05-25
于 2012-10-11T21:42:51.063 回答
1

假设下一行是根据emp_eff_from_date字段(排序),这里有一种解决方法:

WITH DATA 
     AS (SELECT *, 
                Row_number() 
                  OVER ( 
                    PARTITION BY EMP_ID 
                    ORDER BY EMP_EFF_FROM_DATE)rn 
         FROM   TEST) 
SELECT t1.* 
FROM   DATA t1 
       INNER JOIN DATA t2 
               ON t1.RN = t2.RN - 1 
WHERE  Datediff(DAY, t1.EMP_EFF_TO_DATE, t2.EMP_EFF_FROM_DATE) <= 45 

完整的解决方案在这里
如果不是您想要的,请告诉我。

于 2012-10-11T16:41:51.067 回答