3

MySQL 似乎无法使用 GROUP BY 子查询优化选择,并最终导致执行时间很长。对于这种常见场景,必须有已知的优化。

假设我们试图从数据库中返回所有订单,并带有一个标志,表明它是否是客户的第一个订单。

CREATE TABLE orders (order int, customer int, date date);

检索客户的第一笔订单非常快。

SELECT customer, min(order) as first_order FROM orders GROUP BY customer;

但是,一旦我们使用子查询将其与完整的订单集连接起来,它就会变得非常慢

SELECT order, first_order FROM orders LEFT JOIN ( 
  SELECT customer, min(order) as first_order FROM orders GROUP BY customer
) AS first_orders ON orders.order=first_orders.first_order;

我希望我们缺少一个简单的技巧,否则它会快 1000 倍

CREATE TEMPORARY TABLE tmp_first_order AS 
  SELECT customer, min(order) as first_order FROM orders GROUP BY customer;
CREATE INDEX tmp_boost ON tmp_first_order (first_order)

SELECT order, first_order FROM orders LEFT JOIN tmp_first_order 
  ON orders.order=tmp_first_order.first_order;

编辑
受@ruakh 提出的选项 3 的启发,使用 and 确实有一个不那么难看的解决方法INNER JOINUNION它具有可接受的性能但不需要临时表。但是,它对我们的案例有点特殊,我想知道是否存在更通用的优化。

SELECT order, "YES" as first FROM orders INNER JOIN ( 
    SELECT min(order) as first_order FROM orders GROUP BY customer
  ) AS first_orders_1 ON orders.order=first_orders_1.first_order
UNION
SELECT order, "NO" as first FROM orders INNER JOIN ( 
    SELECT customer, min(order) as first_order FROM orders GROUP BY customer
  ) AS first_orders_2 ON first_orders_2.customer = orders.customer 
    AND orders.order > first_orders_2.first_order;
4

2 回答 2

3

以下是您可以尝试的几件事:

  1. 从子查询的字段列表中删除customer,因为它没有做任何事情:

    SELECT order,
           first_order
      FROM orders
      LEFT
      JOIN ( SELECT MIN(order) AS first_order
               FROM orders
              GROUP
                 BY customer
           ) AS first_orders
        ON orders.order = first_orders.first_order
    ;
    
  2. 相反,添加customerON子句,所以它实际上为你做了一些事情:

    SELECT order,
           first_order
      FROM orders
      LEFT
      JOIN ( SELECT customer,
                    MIN(order) AS first_order
               FROM orders
              GROUP
                 BY customer
           ) AS first_orders
        ON orders.customer = first_orders.customer
       AND orders.order = first_orders.first_order
    ;
    
  3. 与以前相同,但使用 anINNER JOIN而不是 a LEFT JOIN,并将原始ON子句转换为CASE表达式:

    SELECT order,
           CASE WHEN first_order = order THEN first_order END AS first_order
      FROM orders
     INNER
      JOIN ( SELECT customer,
                    MIN(order) AS first_order
               FROM orders
              GROUP
                 BY customer
           ) AS first_orders
        ON orders.customer = first_orders.customer
    ;
    
  4. 用表达式中不相关的子查询替换整个JOIN方法:INCASE

    SELECT order,
           CASE WHEN order IN
                      ( SELECT MIN(order)
                          FROM orders
                         GROUP
                            BY customer
                      )
                THEN order
            END AS first_order
      FROM orders
    ;
    
  5. JOIN用表达式中的相关EXISTS子查询替换整个方法CASE

    SELECT order,
           CASE WHEN NOT EXISTS
                      ( SELECT 1
                          FROM orders AS o2
                         WHERE o2.customer = o1.customer
                           AND o2.order < o1.order
                      )
                THEN order
            END AS first_order
      FROM orders AS o1
    ;
    

(很可能上面的一些实际上会表现得更差,但我认为它们都值得尝试。)

于 2012-12-22T17:22:29.350 回答
1

我希望在使用变量而不是 LEFT JOIN 时这会更快:

SELECT
  `order`,
  If(@previous_customer<>(@previous_customer:=`customer`),
    `order`,
    NULL
  ) AS first_order
FROM orders
JOIN ( SELECT @previous_customer := -1 ) x
ORDER BY customer, `order`;

这就是我在SQL Fiddle上的示例返回的内容:

CUSTOMER    ORDER    FIRST_ORDER
1           1        1
1           2        (null)
1           3        (null)
2           4        4
2           5        (null)
3           6        6
4           7        7
于 2012-12-22T17:25:46.283 回答