0

您好我正在运行以下查询来识别重复记录。

SELECT *
          FROM unique2 P WHERE EXISTS(SELECT 1 FROM unique2 C 
                                    WHERE ( (C.surname) =  (P.surname)) 
                                      AND ( (C.postcode) =  (P.postcode)) 
                                      AND ((( (C.forename) IS NULL OR  (P.forename) IS NULL) 
                                      AND  (C.initials) =  (P.initials)) 
                                        OR  (C.forename) =  (P.forename))
                                      AND ( (C.sex) =  (P.sex) 
                                        OR  (C.title) =  (P.title)) 
                                      AND (( (C.address1))=( (P.address1)) 
                                        OR ( (C.address1))=( (P.address2)) 
                                        OR ( (C.address2))=( (P.address1))
                                        OR  instr(C.address1_notrim, P.address1_notrim) > 0 
                                        OR  instr(P.address1_notrim, C.address1_notrim) > 0)
                                      AND C.rowid < P.rowid);

但是通过此查询,我无法识别与重复记录匹配的唯一记录 ID。有没有办法识别重复项以及与这些重复项匹配的唯一记录 ID(我的表具有唯一键)?

4

2 回答 2

1

您也可以使用分析函数执行此操作:

select id, num_of_ids, first_id, surname, postcode, dob
from (
    select id,
        count(*) over (partition by surname, postcode, dob) as num_of_ids,
        first_value(id)
            over (partition by surname, postcode, dob order by id) as first_id,
        surname,
        postcode,
        dob
    from promolog
)
where num_of_ids > 1;

根据您的更新,我认为您可以进行自加入,您可以根据需要将其复杂化:

select dup.*, master.id as duplicate_of
from promolog dup
join promolog master
on master.surname = dup.surname
and master.postcode = dup.postcode
and master.dob = dup.dob
... and <address checks etc. > ...
and master.rowid < dup.rowid;

但也许我仍然缺少一些东西。顾名思义,exists就是用于测试匹配记录的存在;如果您想从匹配的记录中检索任何数据,那么您需要在某个时候加入它。

于 2013-02-21T14:25:23.507 回答
1
select id
from promolog
where surname, postcode, dob in (
  select surname, postcode,dob
  from (
    select surname, postcode, dob, count(1)
    from promolog
    group by surname,postcode,dob
    having count(1) > 1
  )
)
于 2013-02-21T14:18:19.353 回答