0

我的表格包含重复的电子邮件地址。每个电子邮件地址都有唯一的创建日期和唯一的 ID。我想用最近的创建日期及其关联 ID 来识别电子邮件地址,并显示重复的 ID 及其创建日期。我希望查询以下列格式显示:

  • 第 1 列:电子邮件地址
  • 第 2 列:IDKeep
  • 第 3 列:CreateDateofIDKeep
  • 第 4 列:重复 ID
  • 第 5 列:CreateDateofDuplicateID

注意:在某些情况下,存在超过 2 个重复的电子邮件地址。我希望查询在新行上显示每个额外的重复项,在这些实例中重新说明 EmailAddress 和 IDKeep。

无济于事,我试图拼凑在这里找到的不同查询。我目前不知所措——任何帮助/指导将不胜感激。

4

2 回答 2

1

Complicated queries are best solved by breaking it up into pieces and working step-by-step.

First let's create a query to find the key of the row we want to keep, by finding the most recent create date for each email then joining to get the Id:

select x.Email, x.CreateDate, x.Id
from myTable x
join (
    select Email, max(CreateDate) as CreateDate
    from myTable
    group by Email
) y on x.Email = y.Email and x.CreateDate = y.CreateDate

Ok, now let's make a query to get duplicate email addresses:

select Email
from myTable
group by Email
having count(*) > 1

And join this query back to the table to get the keys for every row that has duplicates:

select x.Email, x.Id, x.CreateDate
from myTable x
join (
    select Email
    from myTable
    group by Email
    having count(*) > 1
) y on x.Email = y.Email

Great. Now all that is left is to join the first query with this one to get our result:

select keep.Email, keep.Id as IdKeep, keep.CreateDate as CreateDateOfIdKeep,
    dup.Id as DuplicateId, dup.CreateDate as CreateDateOfDuplicateId
from (
    select x.Email, x.CreateDate, x.Id
    from myTable x
    join (
        select Email, max(CreateDate) as CreateDate
        from myTable
        group by Email
    ) y on x.Email = y.Email and x.CreateDate = y.CreateDate
) keep
join (
    select x.Email, x.Id, x.CreateDate
    from myTable x
    join (
        select Email
        from myTable
        group by Email
        having count(*) > 1
    ) y on x.Email = y.Email
) dup on keep.Email = dup.Email and keep.Id <> dup.Id

Note the final keep.Id <> dup.Id predicate on the join ensures we don't get the same row for both keep and dup.

于 2015-04-02T01:27:01.163 回答
0

以下子查询使用一个技巧来获取每封电子邮件的最新 ID 和创建日期:

select Email, max(CreateDate) as CreateDate,
       substring_index(group_concat(id order by CreateDate desc), ',', 1) as id
from myTable
group by Email
having count(*) > 1;

having()条款还确保这仅适用于重复的电子邮件。

然后,只需将此查询与其余数据组合即可获得您想要的格式:

select t.Email, tkeep.id as keep_id, tkeep.CreateDate as keep_date,
       id as dup_id, CreateDate as dup_CreateDate
from myTable t join
     (select Email, max(CreateDate) as CreateDate,
             substring_index(group_concat(id order by CreateDate desc), ',', 1) as id
      from myTable
      group by Email
      having count(*) > 1
     ) tkeep
     on t.Email = tkeep.Email and t.CreateDate <> tkeep.CreateDate;
于 2015-04-02T01:58:11.877 回答