mysql - 添加累积（运行总计？）列的优化？

Question

我是 sql 新手，到目前为止，这个论坛一直是我的生命线。感谢您在这个伟大的平台上创建和分享。

我目前正在处理一个大型数据集，希望得到一些指导。

数据表（existing_table）有 400 万行，如下所示：

id  date   sales_a   sales_b   sales_c   sales_d   sales_e

请注意，有多个行具有相同的日期。

我想要做的是在此表中再添加 5 列（cumulative_sales_a、cumulative_sales_b等），这些列将包含 a、b、c 等的累积销售数据，直到特定日期（这将按日期分组）。我使用以下代码来执行此操作：

create table new_cumulative  
select t.id, t.date, t.sales_a, t.sales_b, t.sales_c, t.sales_d, t.sales_e,   
(select sum(x.sales_a) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_a,  
(select sum(x.sales_b) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_b,  
(select sum(x.sales_c) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_c,  
(select sum(x.sales_d) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_d,  
(select sum(x.sales_e) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_e  
from existing_table t  
group by t.id, t.date;

在运行此查询之前，我已经在“id”列上创建了一个索引。

虽然我得到了想要的输出，但这个查询花了将近 11 个小时才完成。

我想知道我是否在这里做错了什么以及是否有更好（更快）的方式来运行此类查询。

谢谢您的帮助。

score 0 · Accepted Answer

看起来是 MySQL 变量查询的好地方。在这种情况下，我会通过您预期的“ID”和“日期”预先查询所有聚合以删除重复项，并将单个条目作为一天的总计。获取此结果并按 ID 和日期对其进行排序，以便为下一部分加入“@sqlvariables”版本做准备。

现在，只需按顺序处理它们并继续为每个 ID 累积直到新 ID，然后将计数器重置为零，但继续添加相应的“销售额”。处理完每条“记录”后，将 @lastID 设置为刚刚处理的 ID，以便在处理下一行时进行比较，以确定是否继续在同一个人上，或强制重置为零。

为了帮助优化和确保内部的“PreAgg”regate 查询，确保在（ID，Date）上有一个索引。对你来说应该超级快。

SELECT
      PreAgg.ID,
      PreAgg.`Date`,
      PreAgg.SalesA,
      PreAgg.SalesB,
      PreAgg.SalesC,
      PreAgg.SalesD,
      PreAgg.SalesE,
      @CumulativeA := if( @lastID := PreAgg.ID, @CumulativeA, 0 ) + PreAgg.SalesA as CumulativeA,
      @CumulativeB := if( @lastID := PreAgg.ID, @CumulativeB, 0 ) + PreAgg.SalesB as CumulativeB,
      @CumulativeC := if( @lastID := PreAgg.ID, @CumulativeC, 0 ) + PreAgg.SalesC as CumulativeC,
      @CumulativeD := if( @lastID := PreAgg.ID, @CumulativeD, 0 ) + PreAgg.SalesD as CumulativeD,
      @CumulativeE := if( @lastID := PreAgg.ID, @CumulativeE, 0 ) + PreAgg.SalesE as CumulativeE,
      @lastID := PreAgg.ID as dummyPlaceholder
   from 
      ( select 
              t.id, 
              t.`date`, 
              SUM( t.sales_a ) SalesA, 
              SUM( t.sales_b ) SalesB, 
              SUM( t.sales_c ) SalesC,
              SUM( t.sales_d ) SalesD,
              SUM( t.sales_e ) SalesE
           from
              existing_Table t
           group by
              t.id,
              t.`date`
           order by
              t.id,
              t.`date` ) PreAgg,
      ( select 
              @lastID := 0,
              @CumulativeA := 0,
              @CumulativeB := 0,
              @CumulativeC := 0,
              @CumulativeD := 0,
              @CumulativeE := 0 ) sqlvars

score 0 · Accepted Answer

一些查询本质上是昂贵的并且需要很长时间才能执行。在这种特殊情况下，您可以避免使用 5 个子查询：

SELECT a.*, b.cumulative_sales_a, b.cumulative_sales_b, ...
FROM 
(
 select t.id, t.`date`, t.sales_a, t.sales_b, t.sales_c, t.sales_d, t.sales_e
 from existing_table t  
 GROUP BY t.id,t.`date`
)a
LEFT JOIN 
(
  select x.id, x.date, sum(x.sales_a) as  cumulative_sales_a,
  sum(x.sales_b) as cumulative_sales_b, ...
  FROM existing_table x 
  GROUP BY x.id, x.`date`
)b ON (b.id = a.id AND b.`date` <=a.`date`)

这也是一个昂贵的查询，但它应该有比原来更好的执行计划。另外，我不确定是否

select t.id, t.`date`, t.sales_a, t.sales_b, t.sales_c, t.sales_d, t.sales_e
 from existing_table t  
 GROUP BY t.id,t.`date`

为您提供您想要的 - 例如，如果您有 5 条具有相同 id 和日期的记录，它将从这 5 条记录中的任何一条中获取其他字段（sales_a、sales_b 等）的值...

score 0 · Accepted Answer

您可以在一个查询中将所有带有 sum 的 mini-select 加入为

(select sum(x.sales_a) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_a,  
(select sum(x.sales_b) from existing_table x where x.id = t.id and x.date <= t.date) as  cumulative_sales_b,  
(select sum(x.sales_c) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_c,  
(select sum(x.sales_d) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_d,  
(select sum(x.sales_e) from existing_table x where x.id = t.id and x.date <= t.date) as cumulative_sales_e

在

select sum(..),sum(..),sum(...),sum(..),sum(..)
from existing table x 
where x.id=t.id and x.date<=t.date

mysql - 添加累积（运行总计？）列的优化？

3 回答 3

Related

Reference