1

我想出了一个脚本来选择“主”帐户和“从”帐户。公司名称和邮政编码完全匹配的地方。它认为最近更新的帐户是主帐户。

select
    m.ev870_acct_code, m.ev870_company_name, m.ev870_postal_code, m.ev870_iacvb_code,
    s.ev870_acct_code, s.ev870_company_name, s.ev870_postal_code, s.ev870_iacvb_code
from
    ev870_acct_master m
inner join
    ev870_acct_master s
on
    m.ev870_company_name = s.ev870_company_name
and m.ev870_postal_code = s.ev870_postal_code
and m.ev870_upd_stamp > s.ev870_upd_stamp
where
    m.ev870_class = 'o'
and s.ev870_class = 'o'
and m.ev870_status != '0'
and s.ev870_status != '0'
and (m.ev870_iacvb_code = s.ev870_iacvb_code or isnull(m.ev870_iacvb_code,'') = '' or isnull(s.ev870_iacvb_code,'') = '')
and s.ev870_company_name like '%council%'
order by
    m.ev870_upd_stamp desc

脚本的问题在于它可能确定:

  • 账户 1 是主账户,并且存在重复的从账户 2。
  • 账户 1 是主账户,并且存在重复的从账户 3。
  • 账户 2 是主账户,并且存在重复的从账户 3。

如您所见,每个步骤的结果都会影响下一个步骤。你能推荐一个更智能的查询吗?

编辑解决方案:

select
    m.ev870_acct_code, m.ev870_company_name, m.ev870_postal_code, m.ev870_iacvb_code,
    s.ev870_acct_code, s.ev870_company_name, s.ev870_postal_code, s.ev870_iacvb_code
from
    ev870_acct_master s
inner join 
    (
    select 
        ev870_acct_code, ev870_company_name, ev870_postal_code, ev870_iacvb_code, ev870_upd_stamp
        ,row_number() over (partition by ev870_company_name, ev870_postal_code, ev870_iacvb_code order by ev870_upd_stamp desc) as howRecent
    from 
        ev870_acct_master
    where
        ev870_class = 'o'
    and ev870_status != '0'
    and ev870_postal_code != ''
    and ev870_company_name like 'A%'
    ) m 
on  
    m.ev870_company_name = s.ev870_company_name
and m.ev870_postal_code = s.ev870_postal_code
and m.ev870_upd_stamp > s.ev870_upd_stamp
where
    m.howRecent = 1
and m.ev870_iacvb_code = s.ev870_iacvb_code
and s.ev870_class = 'o'
and s.ev870_status != '0'
4

1 回答 1

0

关注您的评论:

@kristof 我正在识别我的应用程序中的重复帐户,然后将其合并到一个帐户中。

您可以使用类似于此的代码:

declare @dupExample table (
    id int identity(1,1)
    ,name varchar(50)
    ,postal varchar(50)
    ,lastUpdated datetime
)

insert into @dupExample(name, postal, lastUpdated)
values 
    ('a','pc1','20120101')
    ,('a','pc1','20120501')
    ,('a','pc1','20120601')
    ,('a','pc1','20120701')
    ,('a','pc1','20120201')
    ,('b','pc2','20120102')
    ,('b','pc2','20120202')
    ,('b','pc2','20120302')
    ,('b','pc2','20120302')
    ,('c','pc2','20120302')
    ,('d','pc2','20120302')
    ,('d','pc2','20120302')


select * from @dupExample 


/*
    to see duplicates along with how recent they are
*/
select 
    *
    ,row_number() over (partition by name, postal order by lastUpdated) as howRecent
from 
    @dupExample     

/*
    delete duplicates leaving only most recent record based on date updated
    WARNING only one record will be left for each dup even if there are multiple records 
    updated on the same date (see b, d examples)
*/
delete de
from    @dupExample de
    inner join 
    ( select 
        id
        ,row_number() over (partition by name, postal order by lastUpdated desc) as howRecent
        from 
            @dupExample     
    ) der on de.id = der.id
where
    der.howRecent > 1   

/*
    after delete
*/      
select * from @dupExample   

如果您有相同 dateUpdated 的重复项,您可以添加一些额外的标准来部分排序,以指定在这种情况下要删除的记录 - 但希望它能给您一个好的起点。

于 2012-07-24T16:31:29.110 回答