mysql - 为什么 MySQL 选择似乎效率较低的索引？

Question

我继承的一些数据库/查询有问题。这是针对一些大型数据集和对它们进行的报告。

我正在尝试调整和调整以获得一些改进。

发生的事情是我不是 100% 清楚 MySQL 如何决定使用哪个索引。

为什么下面列出的第一个查询不使用查询 2 中使用的索引。在查询 2 中，我正在做我假设查询引擎应该做的事情，获取小表，获取适当的值，然后应用它们搜索更大的表，并利用适当的索引。

我在这里做错了什么？或者更确切地说，我对外键、索引和连接在这里的工作方式有什么误解:)

这是2个相关表

表 1
~450 行

CREATE TABLE `client_accounts_dim` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `client_id` int(10) unsigned NOT NULL,
 `service_provider_id` int(10) unsigned NOT NULL,
 `account_number` varchar(45) NOT NULL,
 `label` varchar(128) DEFAULT NULL,
 `service_provider_name` varchar(45) NOT NULL,
 `client_name` varchar(45) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `client_id` (`client_id`,`account_number`)
) ENGINE=InnoDB;

表 2
~11,000,000 行

CREATE TABLE `invoices_fact` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `invoice_number` varchar(45) NOT NULL COMMENT '    ',
 ...
 ...
 `tracking_number` varchar(45) DEFAULT NULL,
 `division_id` int(11) DEFAULT NULL,
 `client_accounts_dim_id` int(10) unsigned NOT NULL,
 `invoice_date_dim_id` bigint(20) DEFAULT NULL,
 `shipment_date_dim_id` bigint(20) NOT NULL,
 `received_date_dim_id` bigint(20) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `fk_invoice_details_client_accounts_dim1_idx` (`client_accounts_dim_id`),
 KEY `invoice_date_dim_id` (`invoice_date_dim_id`),
 KEY `shipment_date_dim_id` (`shipment_date_dim_id`,`client_accounts_dim_id`,`division_id`,`tracking_number`),
 CONSTRAINT `fk_invoice_details_client_accounts_dim1` FOREIGN KEY (`client_accounts_dim_id`) REFERENCES `client_accounts_dim` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB;

第一个查询我在哪里进行基本连接

SELECT count(distinct tracking_number) as val, p.division_id as division_id 
FROM client_accounts_dim c, invoices_fact p 
WHERE c.id = p.client_accounts_dim_id
AND p.division_id IN (2,3,7)
AND c.client_id = 17
AND p.shipment_date_dim_id between 20120101 and 20121108
GROUB BY p.division_id;

28 秒内运行
解释产量

+----+-------------+-------+------+------------------------------------------------------------------+---------------------------------------------+---------+---------+------+-------------+
| id | select_type | table | type | possible_keys                                                    | key                                         | key_len | ref     | rows | Extra       |
+----+-------------+-------+------+------------------------------------------------------------------+---------------------------------------------+---------+---------+------+-------------+
|  1 | SIMPLE      | c     | ref  | PRIMARY,client_id                                                | client_id                                   | 4       | const   |   49 | Using index |
|  1 | SIMPLE      | p     | ref  | fk_package_details_client_accounts_dim1_idx,shipment_date_dim_id | fk_package_details_client_accounts_dim1_idx | 4       | c.id    |  913 | Using where |
+----+-------------+-------+------+------------------------------------------------------------------+---------------------------------------------+---------+---------+------+-------------+

通过首先运行查询，然后将 client_accounts_dim_ids 放入，查询我“手动”在哪里进行连接。

SELECT count(distinct tracking_number) as val, p.division_id as division_id 
FROM invoices_fact p
WHERE division_id in (2,3,7)
AND p.client_accounts_dim_id IN ( 232, 233, 234, 277, 235, 236, 279, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 278, 280, 262, 263, 264, 252, 256, 254, 259, 261, 257, 266, 276, 267, 255, 258, 274, 273, 272, 271, 269, 270, 268, 275, 253, 265, 260 )
AND p.shipment_date_dim_id between 20120101 and 20121108 
GROUP BY p.division_id;

在 1.6 秒内运行
解释产量：

+----+-------------+-------+-------+------------------------------------------------------------------+------------------------+---------+------+---------+--------------------------+
| id | select_type | table | type  | possible_keys                                                    | key                    | key_len | ref  | rows    | Extra                    |
+----+-------------+-------+-------+------------------------------------------------------------------+------------------------+---------+------+---------+--------------------------+
|  1 | SIMPLE      | p     | range | fk_package_details_client_accounts_dim1_idx,shipment_date_dim_id | shipment_date_dim_id   | 19      | NULL | 4991810 | Using where; Using index |
+----+-------------+-------+-------+------------------------------------------------------------------+------------------------+---------+------+---------+--------------------------+

score 1 · Accepted Answer

MySQL 确实应该首先查看最小的表，它是 - client_accounts_dim。你已经给了它索引，所以它可以很容易地client_id提取信息。client_id=17

然后，mysql 需要将其id加入到invoice_fact. 你已经给它了fk_invoice_details_client_accounts_dim1_idx这个任务。这一切听起来很合理！

现在，两个问题，一个难，一个容易。第一的：

一旦 MySQL 在您的索引中找到了 client_accounts_dim.client_id=17 的行，它是如何获得它需要加入的 client_id 的？

第二个：

一旦 MySQL 加入 invoices_fact.client_accounts_dim_id，它如何应用 WHERE 子句中的其余信息？

对于第一个问题，我已经读到 InnoDB 将主键放入所有后续索引中，但我无法确定一个明确的解释，即它将用于您的连接。我建议将其设为明确的复合索引：

client_accounts_dim (client_id, id)

对于第二个问题，一旦 MySQL 在索引中找到了连接信息，它就必须从磁盘中读取所有适当的行，以查看其中哪些在您指定的分区和日期范围内。救援的另一个综合指数：

invoices_fact (client_accounts_dim_id, division_id, shipment_date_dim_id)

注意：将第 2 列和第 3 列按正确顺序排列，最低基数列在前。

现在，MySQL 可以只困扰您的索引来收集完整的行列表！

除了上面讨论的用于连接的列之外，您似乎只使用了一列 - invoices_fact.tracking_number。如果将它添加到索引中，MySQL 可以从索引中获取查询所需的所有内容，而无需从磁盘读取底层行。

invoices_fact (client_accounts_dim_id, division_id, shipment_date_dim_id, tracking_number)

注意：tracking_number是一个宽列，它会增加你的索引，减慢写入速度，占用更多磁盘空间等。你可以同时测试它。

希望这可以帮助。

mysql - 为什么 MySQL 选择似乎效率较低的索引？

1 回答 1

Related

Reference