0

我正在运行一个通过日期范围搜索连接多个表的查询,并试图弄清楚如何进一步优化它。

SELECT ACC.name AS account_name, CAMP.account_id AS account_id,CAMP.name AS campaign_name,CAMP.id AS campaign_id,ADG.id AS adgroup_id,ADG.name AS adgroup_name,KW.text AS keyword_name,
SUM(SPENT.billed_clicks) AS billed_clicks,KW.id AS keyword_id,KW.status_id AS status_id FROM account ACC, campaign CAMP,adgroup ADG,adgroup_keyword KW INNER JOIN keyword_spent SPENT
ON KW.id = SPENT.keyword_id WHERE     summary_date >= '2012-03-01' AND summary_date <= '2012-03-04' AND KW.adgroup_id = ADG.id AND ADG.campaign_id = CAMP.id AND CAMP.account_id = ACC.id
GROUP BY keyword_id

对此的解释产生以下结果 -

+----+-------------+-------+--------+----------------------------+--------------+---------+---------------------------------+--------+----------------------------------------------+
| id | select_type | table | type   | possible_keys              | key          | key_len | ref                             | rows   | Extra                                        |
+----+-------------+-------+--------+----------------------------+--------------+---------+---------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | SPENT | range  | summary_date               | summary_date | 3       | NULL                            | 752191 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | KW    | eq_ref | PRIMARY,FK1948D0E6ED3A5544 | PRIMARY      | 8       | clicksummarydb.SPENT.keyword_id |      1 |                                              | 
|  1 | SIMPLE      | ADG   | eq_ref | PRIMARY,FKBBC2083C29112FD0 | PRIMARY      | 8       | advertisedb.KW.adgroup_id       |      1 |                                              | 
|  1 | SIMPLE      | CAMP  | eq_ref | PRIMARY,FKF7A90110246F33C4 | PRIMARY      | 8       | advertisedb.ADG.campaign_id     |      1 |                                              | 
|  1 | SIMPLE      | ACC   | eq_ref | PRIMARY                    | PRIMARY      | 8       | advertisedb.CAMP.account_id     |      1 |                                              | 
+----+-------------+-------+--------+----------------------------+--------------+---------+---------------------------------+--------+----------------------------------------------+

keyword_spent 表包含超过 150 万行,这里是其上的 show create 表

 | keyword_spent | CREATE TABLE `keyword_spent` (
   `id` bigint(20) NOT NULL auto_increment,
   `summary_date` date NOT NULL,
   `adgroup_id` bigint(20) NOT NULL,
   `keyword_id` bigint(20) NOT NULL,
   `billed_clicks` int(11) default NULL,
   `un_billed_clicks` int(11) default NULL,
   `spent` decimal(20,5) default NULL,
   `last_click_recno` bigint(20) default NULL,
   `campaign_id` bigint(20) NOT NULL,
   `account_id` bigint(20) NOT NULL,
   `total_convs` bigint(20) unsigned default '0',
    PRIMARY KEY  (`id`),
   UNIQUE KEY `keyword_spent_uniq` (`summary_date`,`adgroup_id`,`keyword_id`),
   KEY `idx_account_id` (`account_id`),
   KEY `idx_kw_id` (`keyword_id`),
   KEY `adgroup_id` (`adgroup_id`),
   KEY `campaign_id` (`campaign_id`),
   KEY `summary_date` (`summary_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 | 

我不明白为什么当该日期范围内的记录不超过 100,000 条时,要扫描近 750,000 行。

另外,为什么要进行文件排序而不是使用索引。?

4

3 回答 3

2

尝试在连接谓词中引用的所有列上建立索引:

CREATE INDEX keyword_spent_IX2 ON keyword_spent (keyword_id, summary_date)

-或者-

CREATE INDEX keyword_spent_IX3 ON keyword_spent (summary_date, keyword_id)

-或者-您甚至可以创建一个覆盖索引,其中包含查询中引用的所有列:

CREATE INDEX keyword_spent_IX4 ON keyword_spent (keyword_id, summary_date,
    billed_clicks, un_billed_clicks, spent, total_convs)

文件排序操作可能是由于 GROUP BY。

我的偏好是使用JOIN ... ON语法而不是老式逗号,并在 WHERE 子句中混合连接谓词。

  FROM account ACC
  JOIN campaign CAMP ON CAMP.account_id = ACC.id
  JOIN adgroup ADG ON ADG.campaign_id = CAMP.id
  JOIN adgroup_keyword KW ON KW.adgroup_id = ADG.id
  JOIN keyword_spent SPENT ON SPENT.keyword_id = KW.id
 WHERE SPENT.summary_date >= '2012-03-01'
   AND SPENT.summary_date <= '2012-03-04'
 GROUP BY SPENT.id

您仅按 SELECT 列表中的非聚合子集进行分组。大多数其他 RDBMS 都会对此抛出异常;MySQL 更自由。

于 2012-07-06T17:47:16.707 回答
1

文件排序不一定是坏的。如Baron Schwartz 的博客文章所示,文件排序不一定与文件有关。这只是在没有可用的有效索引时使用的快速排序。

作为如何优化的一个想法:也许所有的聚合数据都在它自己的子查询中,然后加入这些数据?我在想这样的事情(根据需要进行调整):

SELECT ACC.name AS account_name,
CAMP.account_id AS account_id,
CAMP.name AS campaign_name,
CAMP.id AS campaign_id,
ADG.id AS adgroup_id,
ADG.name AS adgroup_name,
KW.text AS keyword_name,
KW.id AS keyword_id,
JOINED.billed_clicks AS billed_clicks,
JOINED.un_billed_clicks AS un_billed_clicks,
JOINED.total_clicks AS total_clicks,
JOINED.spent AS spent,
JOINED.total_convs AS total_convs
FROM account ACC
INNER JOIN campaign CAMP ON ACC.id = CAMP.account_id
INNER JOIN adgroup ADG ON CAMP.id = ADG.campaign_id
INNER JOIN adgroup_keyword KW ON ADG.id = KW.adgroup_id
INNER JOIN (SELECT
    SUM(billed_clicks) AS billed_clicks,
    SUM(un_billed_clicks) AS un_billed_clicks,
    SUM(billed_clicks) + SUM(un_billed_clicks) AS total_clicks,
    SUM(spent) AS spent,
    SUM(total_convs) AS total_convs,
    id AS keyword_id
    FROM keyword_spent
    GROUP BY keyword_id
) JOINED ON JOINED.keyword_id = KW.id

希望我能正确理解这一点。此解决方案有一个好处:group by/aggregates 保持独立,您不必担心 group by-ing 其他列,这在原始示例中从未做过。

于 2012-07-06T17:52:32.397 回答
1

先尝试对 summary_date 的索引(在 where 中使用),然后是关键字 ID;并在 JOIN 内显式移动日期范围:

ON (SPENT.id = KW.id AND SPENT.summary_date BETWEEN ... AND ...)

此外,尝试创建一个视图,为您提供 SPENT 上的聚合字段。理想情况下,优化器应该更好地理解这一点,并为您节省一些时间。

CREATE VIEW SPENT AS SELECT
    keyword_id,
    SUM(SPENT.billed_clicks) AS billed_clicks,
    SUM(SPENT.un_billed_clicks) AS un_billed_clicks,
    SUM(SPENT.spent) AS spent,
    SUM(SPENT.total_convs) AS total_convs
FROM keyword_spent GROUP BY keyword_id;

这需要首先对keyword_id 和summary_date 进行索引,并且与VIEW 的JOIN 应该相当于100,000 行的SELECT。

于 2012-07-06T17:58:48.773 回答