1

我正在尝试编写一个查询,将重复的客户端记录合并到一条记录中。现在我只是在建立一个映射表来知道我应该用什么映射。

这是我写的两个函数来帮助我

CREATE FUNCTION FindDuplicateClients ()
RETURNS TABLE AS RETURN 
(
    select distinct CLIENT_GUID 
    from CLIENTS c
    inner join
    (
        select FIRST_NAME, LAST_NAME, HOME_PHONE
        from CLIENTS
        group by FIRST_NAME, LAST_NAME, HOME_PHONE
        having COUNT(*) > 1
    ) t on c.FIRST_NAME = t.FIRST_NAME and c.LAST_NAME = t.LAST_NAME and c.HOME_PHONE = t.HOME_PHONE)
go

--Find other clients that map to this client
CREATE FUNCTION FindDuplicateClientsByClient (@Client uniqueidentifier)
RETURNS TABLE AS RETURN 
(
    select distinct CLIENT_GUID 
    from CLIENTS c
    inner join
    (
        select x.FIRST_NAME, x.LAST_NAME, x.HOME_PHONE
        from CLIENTS x
        inner join 
        (
            select FIRST_NAME, LAST_NAME, HOME_PHONE
            from CLIENTS
            where CLIENT_GUID = @Client
        ) y on x.FIRST_NAME = y.FIRST_NAME and x.LAST_NAME = y.LAST_NAME and x.HOME_PHONE = y.HOME_PHONE
        group by x.FIRST_NAME, x.LAST_NAME, x.HOME_PHONE
        having COUNT(*) > 1
    ) t on c.FIRST_NAME = t.FIRST_NAME and c.LAST_NAME = t.LAST_NAME and c.HOME_PHONE = t.HOME_PHONE
    where CLIENT_GUID <> @Client)
go

第一个函数成功返回所有CLIENT_GUID有超过 1 条记录的,第二个你传入一个 GUID,它返回共享“公共信息”的所有其他 guid(在这种情况下是名字、姓氏和家庭电话)

问题是填写我的映射表。我需要遵循一些规则来优先考虑某些重复项。例如,任何有交易的人都不需要CLIENT_GUID更改,但他们可以将其他 GUID 合并到他们中(如果其他 GUID 没有交易)

--Create Mapping table
select CLIENT_GUID, CAST(null as uniqueidentifier) as NEW_CLIENT_GUID
into #mapping
from FindDuplicateClients()

--Do not map people who have transactions
update #mapping
set NEW_CLIENT_GUID = CLIENT_GUID
where CLIENT_GUID in (select CLIENT_GUID from trnHistory)

现在这是我遇到麻烦的地方。我不知道如何获取NEW_CLIENT_GUID在上一个查询中设置的人员列表,FindDuplicateClientsByClient针对该 GUID 运行,并将NEW_CLIENT_GUID任何结果设置为NEW_CLIENT_GUID在不使用游标的情况下输入到函数中的任何结果。

这是我想出的使用光标的方法

declare cur cursor LOCAL FAST_FORWARD for select NEW_CLIENT_GUID from #mapping where NEW_CLIENT_GUID is not null
declare @NEW_CLIENT_GUID uniqueidentifier

open cur
fetch next from cur into @NEW_CLIENT_GUID
while @@fetch_status = 0
begin
    update #mapping
    set NEW_CLIENT_GUID = @NEW_CLIENT_GUID
    where CLIENT_GUID in (select CLIENT_GUID from FindDuplicateClientsByClient(@NEW_CLIENT_GUID)) --Find duplicates to this record
        and NEW_CLIENT_GUID is null --Do not reassign values that are already set (ie: duplicates that have transactions)

    fetch next from cur into @NEW_CLIENT_GUID
end 

close cur
deallocate cur

对我来说,迭代每个#mapping具有值集的结果对我来说似乎不正确。这样做的正确方法是什么?我正在使用 SQL Server 2008 R2,但我希望它也与 SQL Server 2005 兼容。

4

1 回答 1

0

已经两天了,没有答案,我只是坚持使用光标解决方案,因为它的性能足以满足我的需要。

我确实使用了不同的方法,当我不得不第二次循环查找未在上一次传递中映射的人时,但它只是一个 while 循环,其行为与前一个光标完全相同。

declare @tmpGuid uniqueidentifier
select @tmpGuid = CLIENT_GUID from #mapping where NEW_CLIENT_GUID is null 
while @@ROWCOUNT > 0
begin   
  --Set the first unset guid to itself
  update #mapping
  set NEW_CLIENT_GUID = @tmpGuid
  where CLIENT_GUID = @tmpGuid

  --set all other duplicates to the guid we just used.
  update #mapping
  set NEW_CLIENT_GUID = @tmpGuid
  where CLIENT_GUID in (select CLIENT_GUID from FindDuplicateClientsByClient(@tmpGuid))
       and NEW_CLIENT_GUID is null

  --get next guid
  select @tmpGuid = CLIENT_GUID from #mapping where NEW_CLIENT_GUID is null 
end
set nocount off
go
于 2012-11-05T15:01:15.690 回答