2

查询目标:

按地区显示种族。

询问:

SELECT school_data_schools_outer.district_id, 
       school_data_race_ethnicity_raw_outer.year,  
       school_data_race_ethnicity_raw_outer.race,
       ROUND( 
           SUM( school_data_race_ethnicity_raw_outer.count) /
                (SELECT SUM(count)
                   FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_inner
             INNER JOIN school_data_schools as school_data_schools_inner 
                  USING (school_id)
                  WHERE school_data_schools_outer.district_id = school_data_schools_inner.district_id 
                    AND school_data_race_ethnicity_raw_outer.year = school_data_race_ethnicity_raw_inner.year) * 100, 2)
      FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_outer
INNER JOIN school_data_schools as school_data_schools_outer USING (school_id)
  GROUP BY school_data_schools_outer.district_id, 
           school_data_race_ethnicity_raw_outer.year, 
           school_data_race_ethnicity_raw_outer.race

mysql> explain SELECT school_data_schools_outer.district_id, school_data_race_ethnicity_raw_outer.year, school_data_race_ethnicity_raw_outer.race,ROUND(SUM(school_data_race_ethnicity_raw_outer.count)/( SELECT SUM(count) FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_inner INNER JOIN school_data_schools as school_data_schools_inner USING (school_id) WHERE school_data_schools_outer.district_id = school_data_schools_inner.district_id and school_data_race_ethnicity_raw_outer.year = school_data_race_ethnicity_raw_inner.year ) * 100,2) FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_outer INNER JOIN school_data_schools as school_data_schools_outer USING (school_id) GROUP BY school_data_schools_outer.district_id, school_data_race_ethnicity_raw_outer.year, school_data_race_ethnicity_raw_outer.race;
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
| id | select_type        | table                                | type   | possible_keys              | key     | key_len | ref                                                                  | rows  | Extra                           |
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
|  1 | PRIMARY            | school_data_race_ethnicity_raw_outer | ALL    | school_id,school_id_2      | NULL    | NULL    | NULL                                                                 | 84012 | Using temporary; Using filesort |
|  1 | PRIMARY            | school_data_schools_outer            | eq_ref | PRIMARY                    | PRIMARY | 257     | rocdocs_main_drupal_7.school_data_race_ethnicity_raw_outer.school_id |     1 |                                 |
|  2 | DEPENDENT SUBQUERY | school_data_race_ethnicity_raw_inner | ref    | school_id,year,school_id_2 | year    | 4       | func                                                                 |  8402 |                                 |
|  2 | DEPENDENT SUBQUERY | school_data_schools_inner            | eq_ref | PRIMARY                    | PRIMARY | 257     | rocdocs_main_drupal_7.school_data_race_ethnicity_raw_inner.school_id |     1 | Using where                     |
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
4 rows in set (0.00 sec)

mysql>

mysql> describe school_data_race_ethnicity_raw;
+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| id        | int(11)      | NO   | PRI | NULL    | auto_increment |
| school_id | varchar(255) | NO   | MUL | NULL    |                |
| year      | int(11)      | NO   | MUL | NULL    |                |
| race      | varchar(255) | NO   |     | NULL    |                |
| count     | int(11)      | NO   |     | NULL    |                |
+-----------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

mysql> describe school_data_schools;
+-------------+----------------+------+-----+---------+-------+
| Field       | Type           | Null | Key | Default | Extra |
+-------------+----------------+------+-----+---------+-------+
| school_id   | varchar(255)   | NO   | PRI | NULL    |       |
| grade_level | varchar(255)   | NO   |     | NULL    |       |
| district_id | varchar(255)   | NO   |     | NULL    |       |
| school_name | varchar(255)   | NO   |     | NULL    |       |
| address     | varchar(255)   | NO   |     | NULL    |       |
| city        | varchar(255)   | NO   |     | NULL    |       |
| lat         | decimal(20,10) | NO   |     | NULL    |       |
| lon         | decimal(20,10) | NO   |     | NULL    |       |
+-------------+----------------+------+-----+---------+-------+
8 rows in set (0.00 sec)

注意:我也尝试过:

select sds.school_id, 
  detail.year, 
  detail.race,
  ROUND((detail.count / summary.total) * 100 ,2) as percent 
FROM school_data_race_ethnicity_raw as detail
inner join school_data_schools as sds USING (school_id)
inner join (
  select sds2.district_id, year, sum(count) as total
  from school_data_race_ethnicity_raw
  inner join school_data_schools as sds2 USING (school_id)
  group by sds2.district_id, year
  ) as summary on summary.district_id = sds.district_id 
    and summary.year = detail.year
4

2 回答 2

0

这很慢,因为:

  1. 您在 school_data_race_ethnicity_raw_outer 上没有使用索引,因此它正在扫描约 84,000 行中的每一行
  2. 您正在使用相关子查询,这意味着您的复杂计算必须每行运行一次,即 84,000 次。

最好的方法是不使用相关子查询,但如果不是,那么为了让它运行得更快,您需要使用覆盖索引,以便整个内部查询(以及通过它们自己的索引的其他部分)可以闪电般运行仅使用索引快速。有关索引主题的精彩教程,请查看内容。它教会了我很多!现在,您的内部查询仅使用 school_data_race_ethnicity_raw 上的年份索引,因此它必须通过为 84000 次计算中的每一次读取 8000 行来查找所需的其余内容。索引将使这更快,例如在 school_data_race_ethnicity_raw 上创建一个复合索引,您会发现它有帮助:

CREATE index inner_composite ON school_data_race_ethnicity_raw (year, district_id, schoolid, count)

这将允许从索引中获取 WHERE 中使用的所有字段,然后是连接字段,然后是您想要用于选择的字段。您应该会看到它显示在解释结果的“关键”列中。此外,如果你做对了,你会在最右边的列中看到“使用索引”,表明没有发生表访问,这要快几个数量级。

您可以通过为查询提到的列添加大量索引来试验快速和肮脏的风格,并查看在关键列中获取的内容。如果出现问题,请阅读您的查询以查看该表中的其他列正在使用中,然后在右侧添加一个新索引,其中也添加了这些列,看看是否效果更好。一旦发现有效的索引,请记住删除未使用的索引。

MySQL 不允许您直接索引列的 SUM,这将是最快的方法,因此除非您想移动到另一个数据库(如果可以的话,这是个好主意),这总是有点慢。

于 2012-09-06T22:11:38.167 回答
-1

这应该是您汇总数据以按地区计算种族计数所需的全部内容,不确定为什么要在原始数据中进行如此多的数学运算,因为没有必要实现您的目标,并且会强制进行一些疯狂的子查询。

SELECT SUM(students.count) as studentCount, School.district_id, students.race
FROM school_data_schools schools, 
school_data_race_ethnicity_raw students
WHERE shools.school_id = students.school_id
GROUP BY district_id, race

您可能还需要一个关于 school_data_race_ethnicity_raw.school_id 的索引(单独,而不是作为多列键的一部分)

编辑不知道 OP 正在寻找百分比细分,而不仅仅是总数

SELECT ((studentCount / districtTotal) * 100) as percentage, district_id, race

FROM(

SELECT SUM(students.count) as studentCount, Schools.district_id, students.race,
  (SELECT SUM(inStudents.count)
   FROM school_data_schools inSchools, 
    school_data_race_ethnicity_raw inStudents
   WHERE inSchools.school_id = inStudents.school_id
   AND inSchools.district_ID = Schools.district_id
   GROUP BY inSchools.district_id) as districtTotal

    FROM school_data_schools schools, 
    school_data_race_ethnicity_raw students

WHERE schools.school_id = students.school_id
GROUP BY district_id, race
) table1

这将运行得很快,仍然需要确保 school_data_race_ethnicity_raw.school_id 上的索引不是多列索引的一部分。你可以在这里看到它的运行情况,虽然我的测试用例很小,但它似乎确实检查过了。

于 2012-09-06T00:16:23.427 回答