0

首先,我在 DB2 for i5/OS V5R4 上运行。我有 ROW_NUMBER()、RANK() 和公用表表达式。我没有TOP n PERCENT 或 LIMIT OFFSET。

我正在使用的实际数据集很难解释,所以假设我有一个天气历史表,其中的列是(city, temperature, timestamp). 我想比较每组的中位数和平均值(city)

这是我发现获得整个表聚合的中位数的最干净的方法。我在这里从 IBM 红皮书改编了它:

WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
  WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t

这对于恢复单行很有效,但对于分组来说似乎分崩离析。从概念上讲,这就是我想要的:


SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...

城市| 平均温度 | 中值温度       
==================================================== =
'明尼阿波利斯' | 60 | 64
'密尔沃基' | 65 | 66
'马斯基根' | 70 | 61

可能会有一个让我看起来很愚蠢的答案,但我有一个心理障碍,这不是我现在要做的第一件事。似乎有可能,但我不能使用非常复杂的东西,因为它是一个大表,我希望能够自定义正在聚合的列。

4

1 回答 1

1

在 SQL Server 中,count(*) 等聚合函数可以在没有分组依据的情况下进行分区和计算。我快速浏览了参考的红皮书,看起来 DB2 具有相同的功能。但如果没有,那么这将不起作用:

create table TemperatureHistory 
    (City varchar(20)
    , Temperature decimal(5, 2)
    , DateTaken datetime)

insert into TemperatureHistory values ('Minneapolis', 61, '20090101')
insert into TemperatureHistory values ('Minneapolis', 59, '20090102')

insert into TemperatureHistory values ('Milwaukee', 65, '20090101')
insert into TemperatureHistory values ('Milwaukee', 65, '20090102')
insert into TemperatureHistory values ('Milwaukee', 100, '20090103')

insert into TemperatureHistory values ('Muskegon', 80, '20090101')
insert into TemperatureHistory values ('Muskegon', 70, '20090102')
insert into TemperatureHistory values ('Muskegon', 70, '20090103')
insert into TemperatureHistory values ('Muskegon', 20, '20090104')

; with base_t as
    (select city
        , Temperature
        , row_number() over (partition by city order by temperature) as RowNum
        , (count(*) over (partition by city)) + 1 as CountPlusOne 
    from TemperatureHistory)
select City
    , avg(Temperature) as MeanTemp
    , avg(case 
        when RowNum in (FLOOR(CountPlusOne/2.0), CEILING(CountPlusOne/2.0)) 
            then Temperature
            else null end) as MedianTemp
from base_t 
group by City
于 2009-08-08T00:22:57.010 回答