mysql - 缓慢的 MySQL 查询正在填满我的磁盘空间

Question

这是我当前正在运行的查询（28 小时过去了！）：

drop table if exists temp_codes;
create temporary table temp_codes
    select distinct CODE from Table1;
alter table temp_codes
    add primary key (CODE);

drop table if exists temp_ids;
create temporary table temp_ids
    select distinct ID from Table1;
alter table temp_ids
    add primary key (ID);

drop table if exists temp_ids_codes;
create temporary table temp_ids_codes
    select ID, CODE
    from temp_ids, temp_codes;

alter table temp_ids_codes
    add index idx_id(ID),
    add index idx_code(CODE); 

insert into Table2(ID,CODE,cnt)
select 
    a.ID, a.CODE, coalesce(count(t1.ID), 0)
from 
    temp_ids_codes as a
    left join Table1 as t1 on (a.ID = t1.ID and a.CODE=t1.CODE)
group by
    a.ID, a.CODE;

我的表是这个（表1）：

ID         CODE
-----------------
0001        345
0001        345
0001        120
0002        567
0002        034
0002        567
0003        567
0004        533
0004        008
......
(millions of rows)

我正在运行上面的查询以获得这个（表2）：

ID  CODE    CNT
1   008      0
1   034      0
1   120      1
1   345      2
1   533      0
1   567      0
2   008      0
2   034      1
...

CNT 是每个 ID 的每个代码的计数。如何以最佳方式实现这一点以提高性能而不使用磁盘空间？谢谢

score 5 · Accepted Answer

您将数千个代码乘以数百万个 ID，并且想知道为什么要占用磁盘空间。您正在生成数十亿行。这将需要很长时间。

我可能会提出一些建议（您应该重新启动流程还是有资源可以并行运行）。

首先，将中间结果保存在真实表中，也许在另一个数据库（“myTmp”）中，这样您就可以监控进度。

其次，在最终查询中加入之前进行聚合。事实上，因为你使用的是临时表，所以先把它放在一个表中：

select t1.ID, t1.CODE, count(*) as cnt
from Table1 as t1 
group by t1.ID, t1.CODE;

现在，您通过包含所有附加代码然后分组来乘以原始数据。

然后将整个表的连接左连接到这个表。

另一种方法是在原始表上建立索引并尝试以下操作：

insert into Table2(ID,CODE,cnt)
select a.ID, a.CODE,
       (select count(*) from Table1 t1 where a.ID = t1.ID and a.CODE=t1.CODE) as cnt
from temp_ids_codes a
group by a.ID, a.CODE;

这可能看起来有悖常理，但它将使用 table1 上的索引来进行相关子查询。我不喜欢用 SQL 玩这样的游戏，但这可能会导致查询在我们的一生中完成。

score 0 · Accepted Answer

哪里是WHERE子句：

create temporary table temp_ids_codes
select ID, CODE
from temp_ids, temp_codes;

表应该在列上有PKID, CODE

score 0 · Accepted Answer

您可以尝试以下几行（未经测试的查询）：

select a.ID, 
       a.CODE, 
       coalesce(b.countvalue), 0)
from  temp_ids_codes as a
left join ( select count(t1.ID) as countvalue
            from  Table1 as t1
            group by a.ID, a.CODE
           ) b

现在您的 group by 将仅在需要分组的记录上运行（而不是在所有 0 计数记录上）。正确的指数也可以产生巨大的影响。

mysql - 缓慢的 MySQL 查询正在填满我的磁盘空间

3 回答 3

Related

Reference