-3

我有条不紊的疯狂不起作用......我错过了一些东西。第一次,我的任务是清理同一张表中的重复项。我用谷歌搜索了很多东西,比如用公用表表达式删除等等……但我真的没有什么可以使用的。

我的地址表是这样的:

Address
--------
id
add1
add2
city 
state
zip
parentidofthisdup    

我想获得重复项和行号。我认为第 1 行的 id 是父级。对于我拉回的任何后续的 dup 地址行,我想在 arentidofthisdup 中用父级的 id 标记那些。我最终会保留父级并处理在 parentidofthisdup 中具有父级 ID 的父级。

我正在尝试通过执行公用表表达式然后在相关更新中使用 cte 来进行此更新,但是哎呀,这不起作用。我得到的只是所有记录都已更新,但只有空值导致 parentidofthisdup。

也许我没有以正确的方式编码。我对大规模更新相当陌生。

-- My common table expression of the set that I want stamped
with tbFlagTheseWithPk as
(
Select * from 

(
 select  
    myaddress.id,
    myaddress.parentidofthisdup,              
    myaddress.add1, 
    myaddress.add2,
    myaddress.state,
    myaddress.zip,
    row_number() over (partition by add1, state, zip order by add1, state, zip, add2) as [rn]   
    from myaddress
  where     add1 !=''
) as a
where a.rn > 1)

-- Now use our Common Table Expression using a correlated subquery to make them children of rn 1

Update tbFlagTheseWithPk
set 
set parentidofthisdup = 
(   Select id from                                                                                                  
     (Select * from  
    (   select      myaddress.pkey,                                                                          myaddress.parentidofthisdup,                                                                   myaddress.add1,                                                                                 myaddress.add2,
myaddress.state,
myaddress.zip,
row_number() over (partition by add1, state, zip order by a1, state, zip, add2) as [rn]
from myaddress where add1 !=''   
    ) as a                                                                                                  
    where a.rn > 1) as  b  

    where   b.a1 = tbFlagTheseWithPk.add1                                                                                
    and 
 b.state = tbFlagTheseWithPk.state
 and
 b.zip = tbFlagTheseWithPk.zip

    and 
 tbFlagTheseWithPk.rn = 1

没有更好的方法吗?如何克服这种大规模更新学习曲线?我觉得我应该能够以一种优雅的方式做到这一点,但如果我不能很快解决这个问题,我将求助于循环游标并对 SQL 的美丽视而不见......但是那将是一场悲剧。

4

1 回答 1

1

永远不要使用游标。

你在正确的轨道上。这些链接可能有助于 SQL Server - 更新时的内部连接 , Row_Number http://msdn.microsoft.com/en-us/library/ms186734.aspx,CTE http://msdn.microsoft.com/en-us/library/ ms190766(v=sql.105).aspx

DECLARE  @myAddress table
(id int, parentidofthisdup int, add1 nvarchar(10),add2 nvarchar(10) , [state] nvarchar(10),zip nvarchar(10) ) ;


Insert into @myAddress Values(1,null,'a','b','c','d');
Insert into @myAddress Values(2,null,'a','b','c','d');
Insert into @myAddress Values(3,null,'a','b','c','d');
Insert into @myAddress Values(5,null,'a','b','c','d');
Insert into @myAddress Values(6,null,'a','f','c','d');
Insert into @myAddress Values(7,null,'a','b','g','d');
Insert into @myAddress Values(8,null,'a','f','c','d');
with cte AS 
(
select  
    myaddress.id,
    myaddress.parentidofthisdup,              
    myaddress.add1, 
    myaddress.add2,
    myaddress.state,
    myaddress.zip,
    row_number() over (partition by add1, add2, state, zip order by id,add1, [state], zip, add2) as [rn]   
    from @myaddress myaddress


)


update r SET parentidOfthisDup
    = cte.id
    From cte Inner join @myAddress r
    ON cte.add1 = r.add1
          AND cte.add2 =r.add2
          AND cte.Zip =r.zip
          AND  cte.[state] =r.[state]
          and cte.id<>r.id
    WHERE cte.rn = 1 


select * from @myAddress
于 2013-02-15T12:45:18.530 回答