3

我已经编写了以下 PostgreSQL 查询,它可以正常工作。但是,它似乎非常慢,有时需要长达 10 秒才能返回结果。我确信我的陈述中有一些东西导致这很慢。

谁能帮助确定为什么这个查询很慢?

SELECT DISTINCT ON (school_classes.class_id,attendance_calendar.school_date)
  school_classes.class_id, school_classes.class_name, school_classes.grade_id
, school_gradelevels.linked_calendar, attendance_calendars.calendar_id
, attendance_calendar.school_date, attendance_calendar.minutes
, teacher_join_classes_subjects.staff_id, staff.first_name, staff.last_name  

FROM school_classes 
INNER JOIN school_gradelevels ON school_gradelevels.id=school_classes.grade_id 
INNER JOIN teacher_join_classes_subjects ON teacher_join_classes_subjects.class_id=school_classes.class_id 
INNER JOIN staff ON staff.staff_id=teacher_join_classes_subjects.staff_id 
INNER JOIN attendance_calendars ON attendance_calendars.title=school_gradelevels.linked_calendar 
INNER JOIN attendance_calendar ON attendance_calendar.calendar_id=attendance_calendars.calendar_id 

WHERE teacher_join_classes_subjects.syear='2013' 
AND staff.syear='2013' 
AND attendance_calendars.syear='2013' 
AND teacher_join_classes_subjects.does_attendance='Y' 
AND teacher_join_classes_subjects.subject_id IS NULL 
AND attendance_calendar.school_date<CURRENT_DATE 

AND attendance_calendar.school_date NOT IN (

SELECT com.school_date FROM attendance_completed com
WHERE  com.class_id=school_classes.class_id
AND   (com.period_id='101' AND attendance_calendar.minutes>='151' OR
       com.period_id='95'  AND attendance_calendar.minutes='150') )

我将其替换NOT IN为以下内容:

AND NOT EXISTS (
    SELECT com.school_date
    FROM attendance_completed com
    WHERE com.class_id=school_classes.class_id
    AND com.school_date=attendance_calendar.school_date
    AND (com.period_id='101' AND attendance_calendar.minutes>='151' OR
         com.period_id='95'  AND attendance_calendar.minutes='150') )

解释分析的结果:

唯一(成本=2998.39..2998.41 行=3 宽度=85)(实际时间=10751.111..10751.118 行=1 循环=1)
  -> 排序(成本=2998.39..2998.40 行=3 宽度=85)(实际时间=10751.110..10751.110 行=2 循环=1)
        排序键:school_classes.class_id、出席日历.school_date
        排序方法:快速排序内存:25kB
        -> Hash Join (cost=2.03..2998.37 rows=3 width=85) (实际时间=6409.471..10751.045 rows=2 loops=1)
              哈希条件:((teacher_join_classes_subjects.class_id = school_classes.class_id) AND (school_gradelevels.id = school_classes.grade_id))
              加入过滤器:(不是(子计划 1))
              -> 嵌套循环(成本=0.00..120.69 行=94 宽度=81)(实际时间=2.468..1187.397 行=26460 循环=1)
                    加入过滤器:(attendance_calendars.calendar_id = admission_calendar.calendar_id)
                    -> 嵌套循环(成本=0.00..42.13 行=1 宽度=70)(实际时间=0.087..3.247 行=735 循环=1)
                          加入过滤器:((attendance_calendars.title)::text = (school_gradelevels.linked_calendar)::text)
                          -> 嵌套循环(成本=0.00..40.80 行=1 宽度=277)(实际时间=0.077..1.005 行=245 循环=1)
                                -> 嵌套循环(成本=0.00..39.61 行=1 宽度=27)(实际时间=0.064..0.572 行=49 循环=1)
                                      -> 对teacher_join_classes_subjects 的序列扫描(成本=0.00..10.48 行=4 宽度=14)(实际时间=0.022..0.143 行=49 循环=1)
                                            过滤器:((subject_id IS NULL) AND (syear = 2013::numeric) AND ((does_attendance)::text = 'Y'::text))
                                      -> 在人员上使用 staff_pkey 进行索引扫描(成本=0.00..7.27 行=1 宽度=20)(实际时间=0.006..0.007 行=1 循环=49)
                                            指数条件:(staff.staff_id = teacher_join_classes_subjects.staff_id)
                                            过滤器:(staff.syear = 2013::numeric)
                                -> 在出勤日历上进行 Seq 扫描(成本=0.00..1.18 行=1 宽度=250)(实际时间=0.003..0.006 行=5 循环=49)
                                      过滤器:(attendance_calendars.syear = 2013::numeric)
                          -> Seq Scan on school_gradelevels(成本=0.00..1.15 行=15 宽度=11)(实际时间=0.001..0.005 行=15 循环=245)
                    -> 在出勤_日历上进行 Seq 扫描(成本=0.00..55.26 行=1864 宽度=18)(实际时间=0.003..1.129 行=1824 循环=735)
                          过滤器:(attendance_calendar.school_date Hash (cost=1.41..1.41 rows=41 width=18) (实际时间=0.040..0.040 rows=41 loops=1)
                    -> Seq Scan on school_classes (cost=0.00..1.41 rows=41 width=18) (实际时间=0.006..0.015 rows=41 loops=1)
              子计划 1
                -> 在出勤_完成的 com 上进行 Seq 扫描(成本=0.00..958.28 行=5 宽度=4)(实际时间=0.228..5.411 行=17 循环=1764)
                      过滤器: ((class_id = $0) AND (((period_id = 101::numeric) AND ($1 >= 151::numeric)) OR ((period_id = 95::numeric) AND ($1 = 150::numeric)) ))
4

1 回答 1

2

NOT EXISTS是一个很好的选择。几乎总是比NOT IN. 更多细节在这里。 我稍微简化了您的查询(通常看起来不错):

SELECT DISTINCT ON (c.class_id, a.school_date)
       c.class_id, c.class_name, c.grade_id
      ,g.linked_calendar, aa.calendar_id
      ,a.school_date, a.minutes
      ,t.staff_id, s.first_name, s.last_name  
FROM   school_classes                c
JOIN   teacher_join_classes_subjects t  USING (class_id)
JOIN   staff                         s  USING (staff_id)
JOIN   school_gradelevels            g  ON g.id = c.grade_id 
JOIN   attendance_calendars          aa ON aa.title = g.linked_calendar 
JOIN   attendance_calendar           a  ON a.calendar_id = aa.calendar_id 
WHERE  t.syear = 2013
AND    s.syear = 2013
AND    aa.syear = 2013
AND    t.does_attendance = 'Y'   -- looks like it should be boolean!
AND    t.subject_id IS NULL 
AND    a.school_date < CURRENT_DATE 
AND NOT EXISTS (
   SELECT 1
   FROM   attendance_completed x
   WHERE  x.class_id = c.class_id
   AND    x.school_date = a.school_date
   AND   (x.period_id = 101 AND a.minutes >= 151 OR  -- actually numbers?
          x.period_id =  95 AND a.minutes  = 150)
   )
ORDER BY c.class_id, a.school_date, ???

似乎缺少的是ORDER BY 应该陪伴您的DISTINCT ON. 添加更多ORDER BY项目来代替???. 如果有重复项可供选择,您可能需要定义选择的项。

数字文字不需要单引号,boolean值应该这样编码。
您可能想重温有关数据类型的章节

于 2013-10-31T17:32:14.343 回答