0

背景

我有两张表,在 MySQL 中有不同类型的反馈项。我已经构建了一个查询来组合这些表FULL OUTER JOIN(实际上在 MySQL 中写为两个连接和一个联合)并计算一些平均成绩。这个查询似乎完美地工作:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  LEFT JOIN feedback_service AS s USING(name)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  RIGHT JOIN feedback_service AS s USING(name)
  WHERE l.id IS NULL
  GROUP BY name)
ORDER BY name;

(这在某种程度上简化了可读性,但在这里没有什么区别)

问题

接下来我尝试添加按日期过滤(即只考虑在某个日期之后创建的反馈项)。凭借我的 SQL 技能和我所做的研究,我能够想出这个:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  LEFT JOIN feedback_service AS s USING(name)
  WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  RIGHT JOIN feedback_service AS s USING(name)
  WHERE l.id IS NULL
    AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
  GROUP BY name)
ORDER BY name;

几乎可行:我得到的结果看起来是正确的。但是,缺少一些反馈项。例如,设置一个月前的日期,我在数据库中统计了 21 个人的反馈,但这个查询只返回 19 个人。最糟糕的是,我似乎找不到丢失的物品之间的任何相似之处。

我在这个查询中做错了吗?我认为该WHERE子句在 the 之后进行日期过滤JOIN,理想情况下我可能会在此之前进行。再说一次,我不知道这是否会导致我的问题,而且我也不知道如何以不同的方式编写此查询。

4

2 回答 2

2

我接受了约翰斯的回答,因为他很好地向我解释了这些东西,即使在更一般的意义上,这个答案也很有用。但是,我想我也会发布我到达的第一个解决方案。它正在使用子查询:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM (
    SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
  ) AS l
  LEFT JOIN (
    SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
  ) AS s USING(name)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM (
    SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
  ) AS l
  RIGHT JOIN (
    SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
  ) AS s USING(name)
  WHERE l.id IS NULL
  GROUP BY name)
ORDER BY name;

此查询的结果是正确的。但是,该解决方案看起来并不理想,因为根据我的经验,子查询有时很慢。再说一次,我没有做任何性能分析,所以这里使用子查询可能不是瓶颈。无论如何,它在我的应用程序中运行得足够快。

于 2011-10-29T08:48:46.810 回答
1

完全外连接是 3 个连接的组合:

1- A 和 B 之间的内连接
2- A 和 B 之间的左排除连接
3- A 和 B 之间的右排除连接

请注意,内部和左排除联接的组合是左外部联接,因此您通常将查询重写为left outer join+ right exclusion join
然而,出于调试目的,它对union所有 3 个连接都很有用,并添加一些关于哪个连接做什么的标记:

  /*inner join*/
  (SELECT
     'inner' as join_type 
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  INNER JOIN feedback_service s ON (l.name = s.name) 
  WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) 
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) 
  GROUP BY l.name) 
UNION ALL
  (SELECT
     'left exclusion' as join_type 
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  LEFT JOIN feedback_service s ON (l.name = s.name) 
  WHERE s.id IS NULL
    /*AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) */
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) 
  GROUP BY l.name) 
UNION ALL
  (SELECT 
     'right exclusion' as join_type
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual 
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  RIGHT JOIN feedback_service s ON (s.name = l.name) 
  WHERE l.id IS NULL
    AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) 
    /*AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) */
  GROUP BY s.name) 
ORDER BY listname; 

我认为 WHERE 子句在 JOIN 之后进行日期过滤,理想情况下我可能会在此之前这样做。

如果你想在之前做过滤,那么把它放在join子句中。

于 2011-10-27T11:56:17.447 回答