0

我在 mysql 5.1 上工作。

我想优化它由此生成的查询:

  • 条目:带有 id 和名称的“用户”表(100 000 个条目)
  • 输出:每个字母的第一个 user_ids 及其计数

例子 :

id | name
1  | Bob
2  | Albert
3  | bernard

输出 :

letter | id | count
     A | 2  | 1
     B | 1  | 2

第一个字母 A 有 1 个用户(Albert),字母 B 有 2 个用户(bernard 和 Bob);按字母顺序排列的第一个是伯纳德。

我有一个工作查询。它返回所有字母(和“无字母”),以及第一个用户和计数。

SELECT formatted_letter, id, COUNT(1)
FROM (
  SELECT
    CASE WHEN name REGEXP '[A-Za-z].*'
           THEN UPPER(SUBSTR(name, 1, 1))
         ELSE '@'
    END as formatted_letter, id, name
  FROM `users`
    ... (some joins and conditions)
  ORDER BY name
) AS A
GROUP BY formatted_letter

这可以完美运行并返回正确的值...但是此查询非常耗时(选择 25 000 个用户需要 9 秒)...

您还有其他方法可以优化此查询吗?

我尝试过的事情:

  • 为每个字母做一个大联合,这是最糟糕的(36 秒)。
  • 添加一列'formatted_letter'来删除CASE/WHEN部分,还不错,现在需要8秒。

所有索引都存在于用户 ID、用户名以及连接和条件的所有索引上。

4

2 回答 2

1

这里可能的想法: -

SELECT FirstLetter, MAX(name), SUM(NameCount)
FROM
(
    SELECT substr(name, 1, 1) AS FirstLetter, MIN(name) AS name, COUNT(*) AS NameCount
    FROM company
    GROUP BY FirstLetter
    UNION
    SELECT 'A' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'B' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'C' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'D' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'E' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'F' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'G' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'H' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'I' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'J' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'K' AS FirstLetter, "" AS name, 0 AS NameCount
    UNION
    SELECT 'L' AS FirstLetter, "" AS name, 0 AS NameCount
) sub1
GROUP BY FirstLetter

(我厌倦了输入可能的联合字母来填补空白)。

这确实有效,但不确定在您的大小的表上的性能(在我拥有的随机表/字段上花费不到一秒钟,大约有 140k 记录)。

编辑 - 好再试一次。

您的基本查询归结为这一点(忽略填写空白):-

SELECT CASE WHEN name REGEXP '[A-Za-z].*' THEN UPPER(SUBSTR(name, 1, 1)) ELSE '@' END as formatted_letter, MIN(id) AS id, COUNT(*) AS NameCount
FROM users
GROUP BY formatted_letter

这本身应该是非常有效的。试一试,让我们知道需要多长时间。

如果这很快,工会添加零计数记录应该增加一个标称时间。

在一个有 140k 记录的随机表上尝试这对我来说大约需要 1 秒(并且 name 字段甚至没有被索引)。

添加联合选择不会为查询增加任何明显的时间:-

SELECT formatted_letter, MAX(name), SUM(NameCount)
FROM
(
    SELECT CASE WHEN company REGEXP '[A-Za-z].*' THEN UPPER(SUBSTR(company, 1, 1)) ELSE '@' END as formatted_letter, MIN(id) AS id, COUNT(*) AS NameCount
    FROM users
    GROUP BY formatted_letter
    UNION
    SELECT 'A' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'B' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'C' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'D' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'E' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'F' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'G' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'H' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'I' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'J' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'K' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'L' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'M' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'N' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'O' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'P' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'Q' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'R' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'S' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'T' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'U' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'V' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'W' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'X' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'Y' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT 'Z' AS formatted_letter, "" AS id, 0 AS NameCount
    UNION SELECT '@' AS formatted_letter, "" AS id, 0 AS NameCount
) Sub1
GROUP BY formatted_letter

如果这在你的机器上需要 36 秒左右,那么就会发生一些奇怪的事情

于 2013-10-02T13:04:06.987 回答
0

你所说的“无字母”是什么意思,如果暴露的话,来自(其他连接/条件)可能也可以被优化。至少,您是否只有名字……或者至少是第一个位置的名字?

另外,我会杀死内部的 ORDER BY NAME 子句,因为它对最终输出没有实际影响,无论如何你正在通过 formatted_letter 进行分组...在外部查询中添加 formatted_letter 的顺序,因为那只会返回 26 + '@' 记录并且是即时的。

于 2013-10-02T12:54:42.870 回答