mysql - MySQL 与 SQL Server Express 性能比较

Question

我有一个有点复杂的查询，大约有 100K 行。

查询在 SQL Server Express 中运行 13 秒（在我的开发盒上运行）

具有相同索引和表的相同查询需要超过 15 分钟才能在 MySQL 5.1 上运行（在我的生产机器上运行 - 功能更强大，并且经过 100% 资源测试）有时查询会因内存不足错误而导致机器崩溃。

我在 MySQL 中做错了什么？为什么需要这么长时间？

select e8.*
from table_a e8
inner join (
    select max(e6.id) as id, e6.category, e6.entity, e6.service_date
    from (
        select e4.* 
        from table_a e4
        inner join (
            select max(e2.id) as id, e3.rank, e2.entity, e2.provider_id, e2.service_date
            from table_a e2
            inner join (
                select min(e1.rank) as rank, e1.entity, e1.provider_id, e1.service_date
                from table_a e1
                where e1.site_id is not null
                group by e1.entity, e1.provider_id, e1.service_date 
            ) as e3
            on e2.rank= e3.rank
            and e2.entity = e3.entity
            and e2.provider_id = e3.provider_id
            and e2.service_date = e3.service_date
            and e2.rank= e3.rank
            group by e2.entity, e2.provider_id, e2.service_date, e3.rank
        ) e5
        on e4.id = e5.id
        and e4.rank= e5.rank                            
    ) e6
    group by e6.category, e6.entity, e6.service_date 
) e7
on e8.id = e7.id and e7.category = e8.category

score 2 · Accepted Answer

这个答案我最初试图发布到您已删除的问题，但并未表明这是 MySQL 的问题。我仍然会继续使用 SQL Server 使用 CTE 重构查询，然后转换回嵌套查询（如果有的话）。对格式感到抱歉，杰夫·阿特伍德（Jeff Atwood）将原始发布的文本发给了我，我不得不再次重新格式化。

没有数据、预期结果和好的名称很难做到，但我会将所有嵌套查询转换为 CTE，将它们堆叠起来，有意义地命名它们并重构 - 从排除您不使用的列开始。删除列不会导致改进，因为优化器非常聪明——但它会让你有能力改进你的查询——可能会排除部分或全部 CTE。我不确定您的代码在做什么，但您可能会发现新的 RANK() 类型函数很有用，因为您似乎正在使用带有所有这些自连接的回溯类型的模式。

所以从这里开始吧。我已经为您查看了 e7 的改进，e7 中未使用的列可能表明存在缺陷或对分组可能性的考虑不完整，但如果这些列确实是不必要的，那么这可能会一直渗透到您在 e6 中的逻辑中， e5 和 e3。如果 e7 中的分组是正确的，那么您可以消除结果和连接中除了 max(id) 之外的所有内容。我不明白为什么每个类别会有多个 MAX(id)，因为这会在您加入时成倍增加您的结果，因此 MAX(id) 在类别中必须是唯一的，在这种情况下，该类别在连接中是多余的。

WITH e3 AS (
select min(e1.rank) as rank,
e1.entity,
e1.provider_id,
e1.service_date
from table_a e1
where e1.site_id is not null
group by e1.entity, e1.provider_id, e1.service_date
)

,e5 AS (
select max(e2.id) as id,
e3.rank,
e2.entity,
e2.provider_id,
e2.service_date
from table_a e2
inner join e3
on e2.rank= e3.rank
and e2.entity = e3.entity
and e2.provider_id = e3.provider_id
and e2.service_date = e3.service_date
and e2.rank= e3.rank
group by e2.entity, e2.provider_id, e2.service_date, e3.rank
)

,e6 AS (
select e4.* -- switch from * to only the columns you are actually using
from table_a e4
inner join e5
on e4.id = e5.id
and e4.rank= e5.rank
)

,e7 AS (
select max(e6.id) as id, e6.category -- unused, e6.entity, e6.service_date
from e6
group by e6.category, e6.entity, e6.service_date
-- This instead
-- select max(e6.id) as id
-- from e6
-- group by e6.category, e6.entity, e6.service_date
)

select e8.*
from table_a e8
inner join e7
on e8.id = e7.id
and e7.category = e8.category
-- THIS INSTEAD on e8.id = e7.id

score 1 · Accepted Answer

如果有效的索引可用，100,000 行不应该花费 13 秒。我怀疑这种差异是由于 SQL Server 具有比 MySQL 更强大的查询优化器这一事实。MySQL 的功能更像是 SQL 解析器而不是优化器。

对于初学者，您需要提供更多信息 - 所有参与表的完整模式，以及每个表的完整索引列表。

然后了解数据是关于什么的，以及查询打算产生什么。用例的顺序。

score 1 · Accepted Answer

用两者来解释计划会很有趣，看看有什么区别。我不确定这是否是苹果和橙子的比较，但我会很好奇。

我不知道这是否有帮助，但这是搜索“mysql 查询优化器”时的第一次点击。

score 0 · Accepted Answer

0

这是另一个可能值得的。

于 2009-01-02T02:20:53.477 回答

score 0 · Accepted Answer

我知道的唯一拥有 CTE 的开源数据库是 Firebird ( http://www.firebirdsql.org/rlsnotesh/rlsnotes210.html#rnfb210-cte )

我认为 Postgres 将在 8.4

mysql - MySQL 与 SQL Server Express 性能比较

5 回答 5

Related

Reference