mysql - 优化这个 MySQL 查询

Question

表架构

对于这两个表，CREATE 查询如下所示：

表1： （file_path_key，dir_path_key）

create table Table1(
             file_path_key varchar(500), 
             dir_path_key varchar(500), 
             primary key(file_path_key)) 
engine = innodb;

表2：（file_path_key，hash_key）

create table Table2(
             file_path_key varchar(500) not null, 
             hash_key bigint(20) not null, 
             foreign key (file_path_key) references Table1(file_path_key) on update cascade on delete cascade)
engine = innodb;

目标：

给定一个 file_path F和它的 dir_path 字符串D ，我需要找到所有那些在F的散列集中至少有一个散列的文件名，但它们的目录名不是D。如果一个文件F1与F共享多个哈希，那么它应该重复多次。

请注意，Table1 中的 file_path_key 列和 Table2 中的 hash_key 列已被索引。

在这种特殊情况下，Table1 有大约 350,000 个条目，而 Table2 有 31,167,119 个条目，这使得我当前的查询很慢：

create table temp 
        as select hash_key from Table2 
        where file_path_key = F;

select s1.file_path_key 
        from Table1 as s1 
        join Table2 as s2 
        on s1.file_path_key join 
        temp on temp.hash_key = s2.hash_key 
        where s1.dir_path_key != D

我怎样才能加快这个查询？

score 0 · Accepted Answer

我不明白temp表的用途是什么，但请记住，使用 CREATE .. SELECT 创建的此类表没有任何索引。因此，至少将该声明修复为

CREATE TABLE temp (INDEX(hash_key)) ENGINE=InnoDB AS 
SELECT hash_key FROM Table2 WHERE file_path_key = F;

否则，另一个 SELECT 执行完全连接temp，所以它可能会很慢。

我还建议在 Table1 中使用数字主键（INT、BIGINT）并从 Table2 而不是文本列中引用它。例如：

create table Table1(
             id int not null auto_increment primary key,
             file_path_key varchar(500), 
             dir_path_key varchar(500), 
             unique key(file_path_key)) 
engine = innodb;

create table Table2(
             file_id int not null, 
             hash_key bigint(20) not null, 
             foreign key (file_id) references Table1(id) 
            on update cascade on delete cascade) engine = innodb;

如果在连接谓词中使用整数列而不是文本列，则连接两个表的查询可能会快得多。

mysql - 优化这个 MySQL 查询

1 回答 1

Related

Reference