mysql - Most Efficient Filter Location of Subquery (GROUP BY) In JOIN

Question

SELECT * FROM foo f
INNER JOIN (
    SELECT bar_id, MAX(revision_id) FROM bar b
    GROUP BY bar_id
) t ON t.bar_id = f.bar_id

OK, so hears the question: lets say there are in the millions of records in these tables and I want the query to be as efficient as possible.

Is MySQL going to pull all of the records for the bar table and then filter them on the ON statement at the join level instead of inside the sub query? Or is there a way to filter the items inside the sub query with just SQL by itself before the JOIN filter?

It seems querying all of the records to filter them would be inefficient and I have not thought of a way yet to get around this.

I have tried this but the subquery cannot see the foo table:

SELECT * FROM foo f
INNER JOIN (
    SELECT bar_id, MAX(revision_id) FROM bar b
    WHERE b.bar_id = f.bar_id
    GROUP BY bar_id
) t ON t.bar_id = f.bar_id

Is there a way to pass the id down to the subquery, I just like to do things the optimal way, and I'm sure there is a way to do this.

Thanks for all responses.

score 0 · Accepted Answer

MySQL 是否会提取 bar 表的所有记录，然后ON在连接级别而不是子查询内部的语句中过滤它们？

最有可能的是，它会在进行联接之前完整地执行子查询。如果您想确定，请查看EXPLAIN显示的执行计划。

有一种特殊情况，这种方法甚至可能是有益的：如果bar很大但bar_id只需要很少的值，并且如果许多行来自foo引用这些相同的几个值，那么在将它们连接到行bar_id之前预先为每个 id 选择最大修订foo，可能还不错。

或者有没有办法在过滤器之前只用 SQLJOIN过滤子查询中的项目？

您可以完全避免子查询：

SELECT f.*, MAX(b.revison_id)
FROM foo f INNER JOIN bar b ON b.bar_id = f.bar_id
GROUP BY f.foo_id

我假设每一行都foo可以由它唯一标识foo_id；您可能必须在那里使用多个列，或引入新键。因此，结果中的每一行也将包含一行foo，但前提是中也至少有一个匹配的行bar。中的所有行都bar将在该MAX调用中聚合，因此您可以从中获得最大值revision_id。

我已经尝试过了，但是子查询看不到 foo 表：[…]

好在这不起作用。可行的关闭是一些依赖查询，它必须重复执行，每个 foo 行执行一次。这是一个性能杀手。如果有疑问，请尝试使用您的真实数据并简单地比较足够多的执行次数的执行时间。

结论：尽量避免子查询，更努力地避免依赖子查询。

mysql - Most Efficient Filter Location of Subquery (GROUP BY) In JOIN

1 回答 1

Related

Reference