我对星型模式中的事实表“foo_success”进行了查询,该表有大约 600 万行。该表包含对维度表的(整数)引用,仅此而已。我们使用 MyISAM 作为存储引擎。
查询:
SELECT
hierarchy.level0name,
hierarchy.level1name,
hierarchy.level0,
hierarchy.level1,
date.date,
address.city,
user.emailAddress,
foo_object.name,
foo_object.type,
user_group.groupId,
COUNT(user.id) AS count_user_id,
SUM(foo_object_statistic.passes) AS sum_foo_object_statistic_passes,
SUM(foo_object_statistic.starts) AS sum_foo_object_statistic_starts,
SUM(foo_object_statistic.calls) AS sum_foo_object_statistic_calls
FROM
foo_success,
user,
user_group,
address,
hierarchy,
foo_object,
foo_object_statistic,
date
WHERE (foo_success.userDimensionId = user.id)
AND (foo_success.userGroupDimensionId = user_group.id)
AND (foo_success.addressDimensionId = address.id)
AND (foo_success.hierarchyDimensionId = hierarchy.id)
AND (foo_success.fooObjectDimensionId = foo_object.id)
AND (foo_success.fooObjectStatisticDimensionId = foo_object_statistic.id)
AND (foo_success.dateDimensionId=date.id)
AND hierarchy.level0 = 'XYZ'
AND hierarchy.level1 IS NOT NULL
AND hierarchy.level2 IS NOT NULL
AND hierarchy.level3 IS NOT NULL
AND hierarchy.level4 IS NOT NULL
AND hierarchy.level5 IS NOT NULL
AND hierarchy.level6 IS NULL
AND hierarchy.level7 IS NULL
GROUP BY hierarchy.level0, foo_object.fooObjectId
LIMIT 0, 25;
到目前为止我已经尝试过:
- 这是简单的连接版本,它在速度上等于 INNER JOIN 替代方案。
- 所有连接或作为条件一部分的字段都有索引。
- 我确实在此查询上使用了 EXPLAIN,发现表用户的查询成本(已处理的行数)为 128596,表 foo_success 的查询成本为 77。
- 我试图删除对用户表的依赖,这导致事实表 foo_success 中处理的行数超过 600 万。
完成此查询大约需要 1.5 分钟,这与我对读取速度优化的数据仓库星型模式的预期相差甚远。有什么办法可以优化这个怪物吗?