8

Is there an agregate function that returns any value from a group. I could use MIN or MAX, but would rather avoid the overhead if possible given it's a text field.

My situation is an error log summary. The errors are grouped by the type of error and an example of the error text is displayed for each group. It doesn't matter which error message is used as the example.

SELECT
    ref_code,
    log_type,
    error_number,
    COUNT(*) AS count,
    MIN(data) AS example
FROM data
GROUP BY
    ref_code,
    log_type,
    error_number

What can I replace MIN(data) with to not have to compare 100,000s of varchar(2000) values?

4

3 回答 3

4

您可以将 MIN 与 KEEP 结合使用,如下所示:

MIN(data) keep (dense_rank first order by rowid) AS EXAMPLE

这背后的想法是数据库引擎将通过 ROWID 而不是 VARCHAR(2000) 值对数据进行排序,理论上这应该更快。您可以将 ROWID 替换为主键值,并检查它是否更快

于 2013-01-09T03:08:30.510 回答
3

按照建议的答案,似乎MIN(data)(或MAX(data))是实现我想要的最快方法。我试图不必要地过度优化。

当我可以访问该数据库时,我会尝试出现的任何其他答案,但与此同时,这会排在首位。

感谢大家的努力!

于 2013-01-10T01:09:17.790 回答
2

好吧,既然您询问了 OVER PARTITION AND ORDER BY,下面是一个执行 GROUP BY 的版本,但随后也使用 ROW_NUMBER() 和 OVER 和 PARTITION AND ORDER BY,将ref_code, log_type, error_num遇到的第一个组合编号为第 1 行( 1) 处有任何数据列。然后它重新编号,从 1 开始,在ref_code, log_type, error_num它找到的下一个不同的组合处(与碰巧在那里的任何数据列)。因此,您可以简单地将第 1 行的数据字段作为给定的代表数据字段ref_code, log_type, error_num

它仍然缺少一些东西。如果我没有双通道(一次用于聚合,一次用于 row_number())会更优雅;但是,它可能仍然表现得非常好。我将不得不再考虑一下,看看我是否可以消除双重通行证。

但它避免了大数据字段的任何比较。它代表了一种执行您所要求的方法:从与聚合字段相关的数据字段中提取 1 个代表性样本。

SELECT
    t.ref_code,
    t.log_type,
    t.error_number,
    t.count,
    d.data
FROM
(
    SELECT
        ref_code,
        log_type,
        error_number,
        COUNT(*) as count
    FROM data
    GROUP BY
        ref_code,
        log_type,
        error_number
) t
INNER JOIN 
(
    SELECT
        ref_code,
        log_type,
        error_number,
        data,
        ROW_NUMBER() OVER
        (
            PARTITION BY
                ref_code,
                log_type,
                error_number
            ORDER BY
                ref_code,
                log_type,
                error_number
        ) as row_number
    FROM data
) d on
    d.ref_code = t.ref_code and
    d.log_type = t.log_type and
    d.error_number = t.error_number and
    row_number = 1

最后警告:我没有甲骨文可以尝试。但我确实是通过阅读 Oracle 文档把它放在一起的。


在进一步考虑如何消除 GROUP BY 后,我添加了以下内容,我只在 COUNT(*) 中使用了该 GROUP BY。不过不知道会不会快一点。

SELECT *
FROM
(
    SELECT
        ref_code,
        log_type,
        error_number,
        data,
        ROW_NUMBER() OVER
        (
            PARTITION BY
                ref_code,
                log_type,
                error_number
            ORDER BY
                ref_code,
                log_type,
                error_number
        ) as row_number,
        COUNT(*) OVER
        (
            PARTITION BY
                ref_code,
                log_type,
                error_number
            ORDER BY
                ref_code,
                log_type,
                error_number
        ) as count 

    FROM data
) t
WHERE row_number = 1
于 2013-01-09T05:58:03.000 回答