0

所以我有一个包含 2000 万行的表 bj10dcmegablast。现在我想做一个查询

select *, max(qEnd - qStart) 
from 
     (select qFileID,qLocus,qTranscript,qLength,sFileId,sLocus,sTranscript,
sLength,qStart,qEnd,sStart,sEnd
      from bj10dcmegablast
      where (qLocus, qTranscript)
      in     
        (select distinct qLocus, qTranscript
     from
        (select qLocus, qTranscript, count(distinct sFileID) as counts
         from bj10dcmegablast
        group by qLocus, qTranscript
         having counts > 6) as middle1)) as middle2
group by qLocus,sLocus;

我不知道需要多长时间。我运行了这个查询一小时。还没完。

所以我做了一些测试:

select qLocus, qTranscript, count(distinct sFileID) as counts
from bj10dcmegablast
group by qLocus, qTranscript
having counts > 6

这需要 40 秒。

select distinct qLocus, qTranscript
 from
    (select qLocus, qTranscript, count(distinct sFileID) as counts
     from bj10dcmegablast
    group by qLocus, qTranscript
     having counts > 6) as middle1;

这个需要2分钟。

任何人都可以说出您认为完整查询将持续多长时间?

4

1 回答 1

0

IN子句在 MySQL 中可能效率低下。尝试使用显式连接来执行此操作:

select *, max(qEnd - qStart) 
from (select qFileID, qLocus, qTranscript, qLength, sFileId, sLocus, sTranscript,
             sLength, qStart, qEnd, sStart, sEnd
      from bj10dcmegablast
     ) b join
     (select qLocus, qTranscript, count(distinct sFileID) as counts
      from bj10dcmegablast
      group by qLocus, qTranscript
      having counts > 6
     ) as middle2
     on b.qLocus = middle2.qLocus and b.qTranscript = b.qTranscript
group by qLocus, sLocus;

在这个版本中,您不需要“middle1”别名,因为您已经按这两个字段进行了分组。它们应该是不同的。

于 2012-10-08T14:15:55.233 回答