1

我有一个具有以下结构的分析表(5M 行并且还在增长)

Hits 
  id int() NOT NULL AUTO_INCREMENT,
  hit_date datetime NOT NULL,
  hit_day int(11) DEFAULT NULL,
  gender varchar(255) DEFAULT NULL,
  age_range_id int(11) DEFAULT NULL,
  klout_range_id int(11) DEFAULT NULL,
  frequency int(11) DEFAULT NULL,
  count int(11) DEFAULT NULL,
  location_id int(11) DEFAULT NULL,
  source_id int(11) DEFAULT NULL,
  target_id int(11) DEFAULT NULL,

对表的大多数查询是在两个日期时间之间查询特定的列子集,并将所有行中的所有计数列相加。例如:

SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN count END) AS 'gender_male',
   SUM(CASE gender WHEN 'f' THEN count END) AS 'gender_female',
   SUM(CASE age_range_id WHEN 1 THEN count END) AS 'age_18 - 20',
   SUM(CASE target_id WHEN 1 then count END) AS 'target_test'
   SUM(CASE location_id WHEN 1 then count END) AS 'location_NY'
FROM Hits
WHERE (location_id =1 or location_id = 2)
  AND (target_id = 40 OR target_id = 22)
  AND cast(hit_date AS date) BETWEEN '2012-5-4'AND '2012-5-10'
GROUP BY target.id

查询此表的有趣之处在于 where 子句包括命中列名称和值的任何排列,因为这些是我们要过滤的内容。因此,上面的特定查询是获取 NY 中 18 到 20 岁(age_range_id 1)之间的男性和女性的数量,这些男性和女性属于一个名为“测试”的目标。但是,有超过 8 个年龄组、10 个 klout 范围、45 个位置、10 个来源等(所有外键引用)。

我目前有一个关于 hot_date 的索引和另一个关于 target_id 的索引。正确索引此表的最佳方法是什么?在所有列字段上都有一个复合索引似乎本质上是错误的。

有没有其他方法可以在不使用子查询来汇总所有计数的情况下运行此查询?我做了一些研究,这似乎是获得我需要的数据集的最佳方式,但有没有更有效的方式来处理这个查询?

4

1 回答 1

2

这是您的优化查询。这个想法是摆脱ORs 和CAST()hit_date 上的函数,以便 MySQL 可以利用复合索引来覆盖每个数据子集。您需要按该顺序在 ( location_id, target_id, ) 上创建复合索引。hit_date

SELECT id, gender_male, gender_female, `age_18 - 20`, target_test, location_NY
FROM
(
SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
   SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
   SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
   SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
   SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id =1)
  AND (target_id = 40)
  AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id

UNION ALL

SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
   SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
   SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
   SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
   SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id = 2)
  AND (target_id = 22)
  AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id

UNION ALL

SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
   SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
   SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
   SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
   SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id =1)
  AND (target_id = 22)
  AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id

UNION ALL

SELECT target.id,
   SUM(CASE gender WHEN 'm' THEN 1 END) AS gender_male,
   SUM(CASE gender WHEN 'f' THEN 1 END) AS gender_female,
   SUM(CASE age_range_id WHEN 1 THEN 1 END) AS `age_18 - 20`,
   SUM(CASE target_id WHEN 1 then 1 END) AS target_test,
   SUM(CASE location_id WHEN 1 then 1 END) AS location_NY
FROM Hits
WHERE (location_id = 2)
  AND (target_id = 22)
  AND hit_date BETWEEN '2012-05-04 00:00:00' AND '2012-05-10 23:59:59'
GROUP BY target.id
) a
GROUP BY id

如果您的选择大小太大以至于没有任何改进,那么您不妨像已经在做的那样继续扫描所有行。

请注意,用反引号括起别名,而不是单引号,这已被弃用。我还修正了你的CASE条款,count而不是1.

于 2012-05-10T21:28:59.367 回答