mysql - MySQL 分组和计数性能

Question

我有两张桌子

出口

Name     Type           Collation        Attributes    Null   Default   Extra
id       int(10)        utf8_unicode_ci  UNSIGNED      No     None      AUTO_INCREMENT
email    varchar(150)   utf8_unicode_ci                No     None  
city_id int(11)         utf8_unicode_ci                Yes    NULL

索引

              Type   Unique Packed  Column  Cardinality Collation     Null
id            BTREE  Yes        No      id      769169      A           No
email_index   BTREE  Yes        No     email    769169      A           No
city_id_index BTREE  No         No  city_id.      6356      A          Yes

出口历史

Name     Type           Collation        Attributes    Null   Default   Extra
id       int(10)        utf8_unicode_ci  UNSIGNED      No     None      AUTO_INCREMENT
email    varchar(255)   utf8_unicode_ci                No     None

索引

            Type    Unique  Packed  Column  Cardinality Collation   Null
id          BTREE   Yes     No      id      113887      A           No
email_index BTREE   No      No      email   113887      A           No

我需要获得拥有最多电子邮件（用户）的顶级城市 ID。还有 export_history 表。我需要从结果中排除电子邮件。

最终查询看起来像

主要查询

SELECT COUNT(city_id) as city_count, city_id
    FROM export e
        WHERE NOT EXISTS (
            SELECT * FROM export_history ehistory
                WHERE e.email = ehistory.email
            ) 
        GROUP BY city_id
            ORDER BY city_count DESC
                   LIMIT 5

执行时间约为 7 秒。问题是执行需要这么多。

解释显示：

id select_type       table     type   possible_keys  key            key_len  ref     rows    Extra
1 PRIMARY            e         index  NULL           city_id_index  5        NULL    769169  Using where; Using temporary; Using filesort
2 DEPENDENT SUBQUERY ehistory  ref    email_index    email_index    767      e.email 1     Using where; Using index

请注意这两个查询的工作速度非常快 > 0.01 秒

查询 1

SELECT COUNT(city_id) as city_count, city_id
    FROM export
        GROUP BY city_id
            ORDER BY city_count DESC
                   LIMIT 5

执行时间约为 0.1 秒

查询 2

SELECT *
    FROM export e
        WHERE NOT EXISTS (
            SELECT * FROM export_history ehistory
                WHERE e.email = ehistory.email
            )

执行时间约为 0.02 秒

您能否推荐任何建议以提高主查询的性能？

score 1 · Accepted Answer

您可以通过使用LEFT JOIN ... IS NULL而不是NOT EXISTS使用依赖子查询来简化查询。通过避免重复依赖子查询，它可能（或可能不会：尝试）为您加快速度。

SELECT COUNT(e.city_id) as city_count, e.city_id
  FROM export e
  LEFT JOIN export_history ehistory ON e.email = ehistory.email
 WHERE ehistory.id IS NULL
 GROUP BY e.city_id
 ORDER BY COUNT(e.city_id) DESC
 LIMIT 5;

试试这个复合索引。

CREATE INDEX exp_email_cityid ON export(email, city_id);

如果这无济于事，请尝试以相反顺序使用列的索引：

CREATE INDEX exp_cityid_email ON export(city_id, email);

专业提示：单列索引与为匹配查询中的过滤条件而创建的多列索引不同。

mysql - MySQL 分组和计数性能

1 回答 1

Related

Reference