我有一个按名称命名的客户表:Customer_SCD
在 SQL 中,我有 3 列:Customer_Name
,Customer_ID
Customer_TimeStamp
此表中存在具有不同Timestamp的重复条目。
例如
ABC, 1, 2012-12-05 11:58:20.370
ABC, 1, 2012-12-03 12:11:09.840
我想从数据库中消除它并保持第一个时间/日期可用。
谢谢。
我有一个按名称命名的客户表:Customer_SCD
在 SQL 中,我有 3 列:Customer_Name
,Customer_ID
Customer_TimeStamp
此表中存在具有不同Timestamp的重复条目。
例如
ABC, 1, 2012-12-05 11:58:20.370
ABC, 1, 2012-12-03 12:11:09.840
我想从数据库中消除它并保持第一个时间/日期可用。
谢谢。
这行得通,试试看:
DELETE Customer_SCD
OUTPUT deleted.*
FROM Customer_SCD b
JOIN (
SELECT MIN(a.Customer_TimeStamp) Customer_TimeStamp,
Customer_ID,
Customer_Name
FROM Customer_SCD a
GROUP BY a.Customer_ID, a.Customer_Name
) c ON
c.Customer_ID = b.Customer_ID
AND c.Customer_Name = b.Customer_Name
AND c.Customer_TimeStamp <> b.Customer_TimeStamp
在子查询中,它确定每条记录的第一条记录Customer_Name
,Customer_ID
然后删除所有其他记录的重复记录。我还添加了OUTPUT
返回受语句影响的行的子句。
您也可以使用排名功能来做到这一点ROW_NUMBER
:
DELETE Customer_SCD
OUTPUT deleted.*
FROM Customer_SCD b
JOIN (
SELECT Customer_ID,
Customer_Name,
Customer_TimeStamp,
ROW_NUMBER() OVER (PARTITION BY Customer_ID, Customer_Name ORDER BY Customer_TimeStamp) num
FROM Customer_SCD
) c ON
c.Customer_ID = b.Customer_ID
AND c.Customer_Name = b.Customer_Name
AND c.Customer_TimeStamp = b.Customer_TimeStamp
AND c.num <> 1
查看哪个具有较小的查询成本并使用它,当我检查它时,第一种方法更有效(它具有更好的执行计划)。
这是一个SQL Fiddle
以下查询将为您提供要保留的结果。
Select Customer_Name, Customer_ID, MIN(Customer_TimeStamp) as Customer_TimeStamp
from Customer_SCD
group by Customer_Name, Customer_ID
将结果存储在表变量中,例如@correctTbl
然后加入此表并删除重复项
delete
from Customer_SCD a
inner join @correctTbl b on a.Customer_Name = b.Customer_Name and a.Customer_ID = b.Customer_ID and a.Customer_TimeStamp != b.Customer_TimeStamp