1

我一直在寻找互联网,似乎没有与我的情况相匹配的答案。

我正在努力计算 SQL Server 中的确切下四分位数和上四分位数。我知道 SQL Server 有一个有助于计算四分位数的内置函数,即 NTILE 函数。但这对我的情况来说还不够。

给定下表的值(请注意,该表包含的产品和价格比下表中的要多):

平均价格 产品编号
45.7820 2 2015
46.0142 2 2016
59.0133 2 2017
60.1707 2 2018
62.6600 2 2019

我正在运行以下查询:

SELECT 
    AveragePrice
    ,NTILE(4) OVER (
        PARTITION BY ProductNumber ORDER BY AveragePrice
        ) AS Quartile
FROM products

这给出了以下结果:

平均价格 四分位数
45.7820 1
46.0142 1
59.0133 2
60.1707 3
62.6600 4

对于完整的上下文,它的整体查询如下所示:

    SELECT ProductNumber
    ,MIN(AveragePrice) Minimum
    ,MAX(CASE 
            WHEN Quartile = 1
                THEN AveragePrice
            END) AS Quartile_1
    ,
    MAX(CASE 
            WHEN Quartile = 3
                THEN AveragePrice
            END) AS Quartile_3
    ,MAX(AveragePrice) Maximum
    ,COUNT(Quartile) AS 'Number of items'
FROM (
    SELECT ProductNumber
        ,AveragePrice   
        ,NTILE(4) OVER (
            PARTITION BY ProductNumber ORDER BY ProductNumber
            ) AS Quartile
    FROM #temp_products
    
    ) Vals
GROUP BY ProductNumber
ORDER BY ProductNumber

但是当我手动计算四分位数时,第一个四分位数应该是:45.8981(在这种特殊情况下第一行和第二行的平均值)而不是 46.0142。

第三个四分位数应该是61.41535(在这种特殊情况下是第三个和第二个四分位数的平均值)而不是 60.1707 。

所以要说清楚。这是存储过程的一部分,其中计算多个价格组并将其聚合到包含平均价格的组中。我需要根据这些按产品编号分组的平均价格计算上下四分位数。结果集应包含产品编号、下四分位数和上四分位数。有人可以帮助我或指导我正确的方向吗?

4

2 回答 2

2

NTILE()在某些情况下,以一些奇怪的方式四舍五入。我宁愿使用带有等级的整数除法进行分组。此解决方案适用于任意数量的值,并在需要时使用经过深思熟虑的平均值。

LEAD是捕捉下一行值的神奇窗口函数

select *
    ,[Q] = case when [rank] in ((N+3)/4 ,(N+1)/2, (3*N+1)/4) then
                case [decimal] 
                when 0    then AveragePrice
                when 0.25 then /*pondered avg*/(3*AveragePrice +  LEAD(AveragePrice,1,null)over(PARTITION BY ProductNumber ORDER BY AveragePrice)) / 4
                when 0.5  then /*simple avg*/(    AveragePrice +  LEAD(AveragePrice,1,null)over(PARTITION BY ProductNumber ORDER BY AveragePrice)) / 2
                when 0.75 then /*pondered avg*/(  AveragePrice +3*LEAD(AveragePrice,1,null)over(PARTITION BY ProductNumber ORDER BY AveragePrice)) / 4 
                end
           end
from(
    select  *
        ,[rank]     = ROW_NUMBER()over(PARTITION BY ProductNumber ORDER BY AveragePrice)
        ,[N]        = SUM(1)over()
        ,[group4]   = ((ROW_NUMBER()over(PARTITION BY ProductNumber ORDER BY AveragePrice)-1 )*4 / SUM(1)over())
        ,[decimal]  = case /*rank*/ROW_NUMBER()over(PARTITION BY ProductNumber ORDER BY AveragePrice) 
                        when /*Q1*/  (SUM(1)over()+3)/4   then   (SUM(1)over()+3)/4.0 - FLOOR((SUM(1)over()+3)/4.0) 
                        when /*Q2*/  (SUM(1)over()+1)/2   then   (SUM(1)over()+1)/2.0 - FLOOR((SUM(1)over()+1)/2.0)
                        when /*Q3*/(3*SUM(1)over()+1)/4   then (3*SUM(1)over()+1)/4.0 - FLOOR((3*SUM(1)over()+1)/4.0)
                      end
    from
    (values(45.7820,2,2015),(46.0142,2,2016),(59.0133,2,2017),(60.1707,2,2018),(62.6600,2,2019))a(AveragePrice,ProductNumber,Year)
  )a
平均价格 产品编号 ñ 第 4 组 十进制
45.7820 2 2015 1 5 0 无效的 无效的
46.0142 2 2016 年 2 5 0 0.000000 46.014200
59.0133 2 2017 3 5 1 0.000000 59.013300
60.1707 2 2018 4 5 2 0.000000 60.170700
62.6600 2 2019 5 5 3 无效的 无效的
于 2021-11-04T22:58:46.680 回答
0

好的,受这篇文章的启发,我设法构建了一个实际计算精确四分位数的查询:

  -- ; since it is being used in a sp
    ;WITH quartile_data AS (
    
        SELECT Price,
             ProductNumber
        FROM (values(29.4785,2,2015),(30.0000,2,2016),(33.4762,2,2017),(35.2917,2,2018),(35.8731,2,2018),(36.2475,2,2018),(37.9790,2,2018),(39.5846,2,2018),(67.4443,2,2018))sales(Price,ProductNumber)
     
    )
   --Aggregate into a single record for each group, using MAX to select the non-null 
   --detail value for each column
   -- ISNULL check to include groups with three values as well
    SELECT ProductNumber, 
        (Max(Q1NextVal) - MAX(Q1Val)) * Max(Q1Frac) + Max(Q1Val) as [Q1],
        (Max(MidVal1) + Max(MidVal2)) / 2 [Median],
        (ISNULL(Max(Q3NextVal),0) - MAX(Q3Val)) * Max(Q3Frac) + Max(Q3Val) as [Q3]
    
    -- save the result into a temp table
    INTO #my_temp_table
    FROM (
--Expose the detail values for only the records at the index values 
--generated by the summary subquery. All other values are left as NULL. som NULL.
        SELECT detail.ProductNumber, 
            CASE WHEN RowNum = Q1Idx THEN Price ELSE NULL END Q1Val,
            CASE WHEN RowNum = Q1Idx + 1 THEN Price ELSE NULL END Q1NextVal,
            CASE WHEN RowNum = Q3Idx THEN Price ELSE NULL END Q3Val,
            CASE WHEN RowNum = Q3Idx + 1 THEN Price ELSE NULL END Q3NextVal,
            Q1Frac,
            Q3Frac,
            CASE WHEN RowNum = MidPt1 THEN Price ELSE NULL END MidVal1,
            CASE WHEN RowNum = MidPt2 THEN Price ELSE NULL END MidVal2
        FROM
               --Calculate a row number sorted by measure for each group.
            (SELECT *,  ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY Price) RowNum
            FROM  quartile_data) AS detail
    
        INNER JOIN (
    --Summarize to find index numbers and fractions we need to use to locate 
    --the values at the quartile points.
    -- The modulus operator is used to sum the correct number if the number of rows in the group is even or uneven
            SELECT ProductNumber, 
                FLOOR((COUNT(*) + IIF((COUNT(*) % 2 = 0), 2,1)) / 4.0) Q1Idx,
                ((COUNT(*) + IIF((COUNT(*) % 2 = 0), 2,1)) / 4.0) - FLOOR((COUNT(*) + IIF((COUNT(*) % 2 = 0), 2,1)) / 4.0) Q1Frac,
                (COUNT(*) + 1) / 2 AS MidPt1,
                (COUNT(*) + 2) / 2 AS Midpt2,
                  FLOOR((COUNT(*) * 3 + IIF((COUNT(*) % 2 = 0), 2,3)) / 4.0) Q3Idx,
                ((COUNT(*) * 3 + IIF((COUNT(*) % 2 = 0), 2,3)) / 4.0) - FLOOR((COUNT(*) * 3 + IIF((COUNT(*) % 2 = 0), 2,3)) / 4.0) Q3Frac
            FROM  quartile_data
            GROUP BY ProductNumber
            HAVING COUNT(*) > 1
        
        ) AS summary ON detail.ProductNumber  = summary.ProductNumber
    
    ) AS step_two
    GROUP BY ProductNumber
   -- Include only groups with more than 2 rows
    HAVING count(*) > 2

以下价格:29.4785 30.0000 33.4762 35.2917 35.8731 36.2475 37.9790 39.5846 67.4443

给出正确的值:Q1 = 31.7381000000 和 Q3 = 38.7818000000

使用此在线工具验证

于 2021-11-08T13:28:52.800 回答