5

例如,我有一个包含以下数据的表(TestFI)

FIID   Email
---------
null a@a.com
1    a@a.com   
null b@b.com    
2    b@b.com    
3    c@c.com    
4    c@c.com    
5    c@c.com    
null d@d.com    
null d@d.com

我需要恰好出现两次的记录,并且有 1 行 FIID 为空,而另一行则不是。对于上面的数据,只有“a@a.com 和 b@b.com”符合要求。

我能够像这样构建一个多级查询

    Select
FIID,
Email
from
TestFI
where
Email in
(
    Select
        Email
    from
    (
        Select
                Email
            from
                TestFI
            where
                Email in 
                (
                select
                    Email
                from
                    TestFI
                where
                    FIID is null or FIID is not null
                group by Email
                having 
                    count(Email) = 2
                )
                and
                FIID is null
    )as Temp1
    group by Email
    having count(Email) = 1
)

然而,1000 万条记录用了将近 10 分钟。有一个更好的方法吗?我知道我必须在这里做一些愚蠢的事情。

谢谢

4

4 回答 4

7

我会尝试这个查询:

SELECT   EMail, MAX(FFID)
FROM     TestFI
GROUP BY EMail
HAVING   COUNT(*)=2 AND COUNT(FIID)=1

它将返回 EMail 列和 FFID 的非空值。FFID 的另一个值为空。

于 2013-05-20T22:12:57.007 回答
1

有了索引(email, fid),我很想尝试:

select  tnull.*, tnotnull.*
from testfi tnull join
     testfi tnotnull
     on tnull.email = tnotnull.email left outer join
     testfi tnothing
     on tnull.email = tnothing.email
where tnothing.email is null and
      tnull.fid is null and
      tnotnull.fid is not null;

性能肯定取决于数据库。这将保留索引中的所有访问。在某些数据库中,聚合可能更快。性能还取决于查询的选择性。例如,如果有一个 NULL 记录并且你有 index (fid, email),这应该比聚合快得多。

于 2013-05-20T23:23:11.337 回答
0

I need records that appear exactly twice AND have 1 row with FIID is null and one is not

1

在最里面的选择中,按 count = 2 的电子邮件分组:

        select email, coalesce(fiid,-1) as AdjusteFIID from T
        group by email having count(email) =2

2

        select email, AdjustedFIID
        from
        (
          select email, coalesce(fiid,-1) as AdjusteFIID from T
        group by email having count(email) =2
        )  as X
        group by email
        having min(adjustedFIID) = -1 and max(adjustedFIID) > -1
于 2013-05-20T22:40:58.440 回答
0

也许像...

select
  a.FIID,
  a.Email

from
  TestFI a
  inner join TestFI b on (a.Email=b.Email)

where
  a.FIID is not null
  and b.FIID is null
;

并确保电子邮件和 FIID 已编入索引。

于 2013-05-20T22:14:23.663 回答