3

最近的一个问题中,StevieG 向我展示了如何使用数据透视表解决我的问题。新问题是我必须检查透视表的一些条件。让我们进行最后的查询:

SELECT 
  c.id, 
  GROUP_CONCAT(if(d.name = 'p1', d.value, NULL)) AS 'p1', 
  GROUP_CONCAT(if(d.name = 'p2', d.value, NULL)) AS 'p2', 
  GROUP_CONCAT(if(d.name = 'p3', d.value, NULL)) AS 'p3', 
  GROUP_CONCAT(if(d.name = 'p4', d.value, NULL)) AS 'p4', 
  GROUP_CONCAT(if(d.name = 'p5', d.value, NULL)) AS 'p5', 
  GROUP_CONCAT(if(d.name = 'p6', d.value, NULL)) AS 'p6'
FROM container c
JOIN data d ON c.id = d.container
GROUP BY c.id

显然我不能添加 WHERE 子句(例如,如果我想检查 p5>30)。我找到了两种方法来克服这个问题。首先是通过在派生表中转换它:

SELECT * FROM (
    SELECT 
      c.id, 
      GROUP_CONCAT(if(d.name = 'p1', d.value, NULL)) AS 'p1', 
      GROUP_CONCAT(if(d.name = 'p2', d.value, NULL)) AS 'p2', 
      GROUP_CONCAT(if(d.name = 'p3', d.value, NULL)) AS 'p3', 
      GROUP_CONCAT(if(d.name = 'p4', d.value, NULL)) AS 'p4', 
      GROUP_CONCAT(if(d.name = 'p5', d.value, NULL)) AS 'p5', 
      GROUP_CONCAT(if(d.name = 'p6', d.value, NULL)) AS 'p6'
    FROM container c
    JOIN data d ON c.id = d.container
    GROUP BY c.id
) WHERE p5>30

我发现的另一种方法是添加一个 HAVING 子句:

SELECT 
  c.id, 
  GROUP_CONCAT(if(d.name = 'p1', d.value, NULL)) AS 'p1', 
  GROUP_CONCAT(if(d.name = 'p2', d.value, NULL)) AS 'p2', 
  GROUP_CONCAT(if(d.name = 'p3', d.value, NULL)) AS 'p3', 
  GROUP_CONCAT(if(d.name = 'p4', d.value, NULL)) AS 'p4', 
  GROUP_CONCAT(if(d.name = 'p5', d.value, NULL)) AS 'p5', 
  GROUP_CONCAT(if(d.name = 'p6', d.value, NULL)) AS 'p6'
FROM container c
JOIN data d ON c.id = d.container
GROUP BY c.id
HAVING p5>30

问题在于性能。我正在使用一个包含 50.000 个条目的测试数据库,但产量可能会达到 100 万个。第一个句子(没有检查 p5>30 的那个)在我的开发计算机上执行 1000 个句子需要 0'60 秒(没有缓存),但是第二个和第三个需要 5 多分钟才能完成。

我知道有一个没有数据索引的隐式派生表生成,但是我有哪些优化它的选项?

4

2 回答 2

2

由于data(container, name)是唯一的,因此您不需要使用GROUP_CONCAT. 那这个呢:

SELECT 
  c.id, 
  d_p1.value AS 'p1', 
  d_p2.value AS 'p2', 
  d_p3.value AS 'p3', 
  d_p4.value AS 'p4', 
  d_p5.value AS 'p5'
FROM container AS c
LEFT JOIN data AS d_p1 ON (d_p1.container = c.id AND d_p1.name = 'p1')
LEFT JOIN data AS d_p2 ON (d_p2.container = c.id AND d_p2.name = 'p2')
LEFT JOIN data AS d_p3 ON (d_p3.container = c.id AND d_p3.name = 'p3')
LEFT JOIN data AS d_p4 ON (d_p4.container = c.id AND d_p4.name = 'p4')
LEFT JOIN data AS d_p5 ON (d_p5.container = c.id AND d_p5.name = 'p5')
WHERE d_p5.value > 30

如果 上有索引data(container, name),您的查询应该在几秒钟内运行。

如果data.name长度超过几个字符(比如 5 个字符),您可能应该使用代理(整数)键而不是data.name.

于 2012-10-25T14:34:26.533 回答
1

我将接近 Yak 的尝试,但如果您只寻找“p5.value”大于零的条目,我会重组以仅获取具有 P5 作为“预查询”的条目。如果您有 100,000 条记录,并且只有 20,000 条的“P5.value”大于您的范围 30,则只获取那些第一个...然后加入其余的...此外,请确保您在“数据”表上有一个索引“名称,值”作为索引...此外,确保“容器,名称”上的索引

第一个预查询将已经“连接”符合一个容器的 P5 值,然后作为连接的结果获取其他值

select STRAIGHT_JOIN
      PreQuery.QualifiedContainer ID,
      coalesce( d_p1.Value, ' ' ) p1,
      coalesce( d_p2.Value, ' ' ) p2,
      coalesce( d_p3.Value, ' ' ) p3,
      coalesce( d_p4.Value, ' ' ) p4,
      PreQuery.P5Value  p5,
      coalesce( d_p5.Value, ' ' ) p6
   from
      ( select 
              JustP5.Container as QualifiedContainer,
              JustP5.Value as P5Value
           from
              Container JustP5
           where
                  JustP5.Name = 'p5'
              AND JustP5.Value > 30 
           group by
              JustP5.Container ) as PreQuery

         LEFT JOIN data AS d_p1 
            ON PreQuery.QualifiedContainer = d_p1.container
           AND d_p1.name = 'p1'

         LEFT JOIN data AS d_p2
            ON PreQuery.QualifiedContainer = d_p2.container
           AND d_p2.name = 'p2'

         LEFT JOIN data AS d_p3
            ON PreQuery.QualifiedContainer = d_p3.container
           AND d_p3.name = 'p3'

         LEFT JOIN data AS d_p4
            ON PreQuery.QualifiedContainer = d_p4.container
           AND d_p4.name = 'p4'

         LEFT JOIN data AS d_p6
            ON PreQuery.QualifiedContainer = d_p6.container
           AND d_p6.name = 'p6'

根据您引用的另一个问题,我认为不需要“分组依据”......因为对于给定容器,您只有一次给定“名称/值”对的实例......如果我不正确,那么我要做的就是将 COALESCE() 更改为 GROUP_CONCAT() 并添加 GROUP BY PreQuery.QualifiedContainer

于 2012-10-26T18:33:37.213 回答