2

我有一个按名称命名的客户表:Customer_SCD在 SQL 中,我有 3 列:Customer_NameCustomer_ID Customer_TimeStamp

此表中存在具有不同Timestamp的重复条目。

例如

ABC, 1, 2012-12-05 11:58:20.370

ABC, 1, 2012-12-03 12:11:09.840

我想从数据库中消除它并保持第一个时间/日期可用。

谢谢。

4

2 回答 2

2

这行得通,试试看:

DELETE  Customer_SCD
OUTPUT  deleted.*
FROM    Customer_SCD b
JOIN    (
    SELECT  MIN(a.Customer_TimeStamp) Customer_TimeStamp,
            Customer_ID,
            Customer_Name
    FROM    Customer_SCD a
    GROUP   BY a.Customer_ID, a.Customer_Name
) c ON 
    c.Customer_ID = b.Customer_ID
AND c.Customer_Name = b.Customer_Name
AND c.Customer_TimeStamp <> b.Customer_TimeStamp

在子查询中,它确定每条记录的第一条记录Customer_NameCustomer_ID然后删除所有其他记录的重复记录。我还添加了OUTPUT返回受语句影响的行的子句。

您也可以使用排名功能来做到这一点ROW_NUMBER

DELETE  Customer_SCD
OUTPUT  deleted.*
FROM    Customer_SCD b
JOIN    (
    SELECT  Customer_ID,
            Customer_Name,
            Customer_TimeStamp,
            ROW_NUMBER() OVER (PARTITION BY Customer_ID, Customer_Name ORDER BY Customer_TimeStamp) num
    FROM    Customer_SCD
) c ON 
    c.Customer_ID = b.Customer_ID
AND c.Customer_Name = b.Customer_Name
AND c.Customer_TimeStamp = b.Customer_TimeStamp
AND c.num <> 1

查看哪个具有较小的查询成本并使用它,当我检查它时,第一种方法更有效(它具有更好的执行计划)。

这是一个SQL Fiddle

于 2012-12-18T07:09:05.557 回答
0

以下查询将为您提供要保留的结果。

Select Customer_Name, Customer_ID, MIN(Customer_TimeStamp) as Customer_TimeStamp
from Customer_SCD 
group by Customer_Name, Customer_ID 

将结果存储在表变量中,例如@correctTbl

然后加入此表并删除重复项

delete 
from Customer_SCD a
inner join @correctTbl b on a.Customer_Name = b.Customer_Name and a.Customer_ID = b.Customer_ID and a.Customer_TimeStamp != b.Customer_TimeStamp
于 2012-12-18T07:09:24.580 回答