sql - 在 SQL 查询中查找下一行并仅在前一行匹配时将其删除

Question

我有一张这样的桌子。

|-DT--------- |-ID------|
|5/30 12:00pm |10       |
|5/30 01:00pm |30       |
|5/30 02:30pm |30       |
|5/30 03:00pm |50       |
|5/30 04:30pm |10       |
|5/30 05:00pm |10       |
|5/30 06:30pm |10       |
|5/30 07:30pm |10       |
|5/30 08:00pm |50       |
|5/30 09:30pm |10       |

仅当前一行与下一行具有相同的 ID 时，我才想删除任何重复的行。我想将重复的行与日期时间保持在最远的将来。例如，上表看起来像这样。

|-DT--------- |-ID------|
|5/30 12:00pm |10       |
|5/30 02:30pm |30       |
|5/30 03:00pm |50       |
|5/30 07:30pm |10       |
|5/30 08:00pm |50       |
|5/30 09:30pm |10       |

我能得到关于如何做到这一点的任何提示吗？

score 3 · Accepted Answer

with C as
(
  select ID,
         row_number() over(order by DT) as rn
  from YourTable
)
delete C1
from C as C1
  inner join C as C2
    on C1.rn = C2.rn-1 and
       C1.ID = C2.ID

SE-数据

score 2 · Accepted Answer

执行以下 3 个步骤：http ://www.sqlfiddle.com/#!3/b58b9/19

首先使行顺序：

with a as
(
  select dt, id, row_number() over(order by dt) as rn
  from tbl
)
select * from a;

输出：

|                         DT | ID | RN |
----------------------------------------
| May, 30 2012 12:00:00-0700 | 10 |  1 |
| May, 30 2012 13:00:00-0700 | 30 |  2 |
| May, 30 2012 14:30:00-0700 | 30 |  3 |
| May, 30 2012 15:00:00-0700 | 50 |  4 |
| May, 30 2012 16:30:00-0700 | 10 |  5 |
| May, 30 2012 17:00:00-0700 | 10 |  6 |
| May, 30 2012 18:30:00-0700 | 10 |  7 |
| May, 30 2012 19:30:00-0700 | 10 |  8 |
| May, 30 2012 20:00:00-0700 | 50 |  9 |
| May, 30 2012 21:30:00-0700 | 10 | 10 |

其次，使用序号，我们可以找到哪些行在底部（以及那些不在底部的行）：

with a as
(
  select dt, id, row_number() over(order by dt) as rn
  from tbl
)
select below.*, 
    case when above.id <> below.id or above.id is null then 
        1 
    else 
        0 
    end as is_at_bottom
from a below
left join a above on above.rn + 1 = below.rn;

输出：

|                         DT | ID | RN | IS_AT_BOTTOM |
-------------------------------------------------------
| May, 30 2012 12:00:00-0700 | 10 |  1 |            1 |
| May, 30 2012 13:00:00-0700 | 30 |  2 |            1 |
| May, 30 2012 14:30:00-0700 | 30 |  3 |            0 |
| May, 30 2012 15:00:00-0700 | 50 |  4 |            1 |
| May, 30 2012 16:30:00-0700 | 10 |  5 |            1 |
| May, 30 2012 17:00:00-0700 | 10 |  6 |            0 |
| May, 30 2012 18:30:00-0700 | 10 |  7 |            0 |
| May, 30 2012 19:30:00-0700 | 10 |  8 |            0 |
| May, 30 2012 20:00:00-0700 | 50 |  9 |            1 |
| May, 30 2012 21:30:00-0700 | 10 | 10 |            1 |

第三，删除所有不在底部的行：

with a as
(
  select dt, id, row_number() over(order by dt) as rn
  from tbl
)
,b as 
(
  select below.*, 
       case when above.id <> below.id or above.id is null then 
           1 
       else 
           0 
       end as is_at_bottom
  from a below
  left join a above on above.rn + 1 = below.rn
)
delete a
from a
inner join b on b.rn = a.rn
where b.is_at_bottom = 0;

核实：

select * from tbl order by dt;

输出：

|                         DT | ID |
-----------------------------------
| May, 30 2012 12:00:00-0700 | 10 |
| May, 30 2012 13:00:00-0700 | 30 |
| May, 30 2012 15:00:00-0700 | 50 |
| May, 30 2012 16:30:00-0700 | 10 |
| May, 30 2012 20:00:00-0700 | 50 |
| May, 30 2012 21:30:00-0700 | 10 |

您还可以将删除简化为：http ://www.sqlfiddle.com/#!3/b58b9/20

with a as
(
  select dt, id, row_number() over(order by dt, id) as rn
  from tbl
)
delete above
from a below
left join a above on above.rn + 1 = below.rn
where case when above.id <> below.id or above.id is null then 1 else 0 end = 0;

不过，Mikael Eriksson 的答案是最好的，如果我再次简化我的简化查询，它看起来就像他的答案ツ为此，我 +1 了他的答案。不过，我会让他的查询更具可读性；通过交换加入顺序并提供良好的别名。

with a as
(
  select *, row_number() over(order by dt, id) as rn
  from tbl
)
delete above

from a below
join a above on above.rn + 1 = below.rn and above.id = below.id;

现场测试：http ://www.sqlfiddle.com/#!3/b58b9/24

score 0 · Accepted Answer

在这里，只需将 [Table] 替换为您的表格名称。

SELECT * 
FROM [dbo].[Table]
WHERE [Ident] NOT IN 
(
    SELECT Extent.[Ident]
    FROM 
    (
        SELECT  TOP 100 PERCENT T1.[DT], 
                T1.[ID],
                T1.[Ident],
                (
                    SELECT TOP 1 Previous.ID
                    FROM [dbo].[Table] AS Previous
                    WHERE Previous.[Ident] > T1.Ident -- this is where the identity seed is important
                    ORDER BY [Ident] ASC
                ) AS 'PreviousId'
        FROM [dbo].[Table] AS T1
        ORDER BY T1.[Ident] DESC
    ) AS Extent
    WHERE [Id] = [PreviousId]
)

注意：您需要在表格上添加一个缩进列 - 如果您无法更改表格的结构，请使用 CTE。

score 0 · Accepted Answer

您可以尝试以下查询...

select * from 
(
    select *,RANK() OVER (ORDER BY dt,id) AS Rank  from test
) as a
where 0 = (
select count(id) from (
select id, RANK() OVER (ORDER BY dt,id) AS Rank  from test
)as b where b.id = a.id and b.Rank = a.Rank + 1 

) order by dt

谢谢，马赫什

sql - 在 SQL 查询中查找下一行并仅在前一行匹配时将其删除

4 回答 4

Related

Reference