0

我正在尝试使用 row_number 来计算箱线图的中位数、下四分位数和上四分位数。但是,由于关系,我的 row_number 排序已关闭。

以下是一些示例数据:

CREATE TABLE EStats    
(
    PersonID            VARCHAR(30)     NOT NULL,
    Grade               VARCHAR(25)     NOT NULL,
    CourseDate          Date            NOT NULL
);

INSERT INTO EStats
(
    PersonID, Grade, CourseDate
)

VALUES
    ('100', '91', '2010-03-01'),
    ('101', '96', '2010-03-01'),
    ('102', '88', '2010-03-01'),
    ('103', '92', '2010-03-01'),
    ('104', '81', '2010-03-01'),
    ('105', '85', '2010-03-01'),
    ('106', '91', '2010-03-01'),
    ('107', '89', '2010-03-01'),
    ('108', '99', '2010-03-01'),
    ('109', '88', '2010-03-01'),
    ('110', '81', '2011-03-02'),
    ('111', '77', '2011-03-02'),
    ('112', '88', '2011-03-02'),
    ('113', '76', '2011-03-02'),
    ('114', '69', '2011-03-02'),
    ('115', '70', '2011-03-02'),
    ('116', '75', '2011-03-02'),
    ('117', '88', '2011-03-02'),
    ('118', '76', '2011-03-02'),
    ('119', '95', '2012-03-01'),
    ('120', '96', '2012-03-01'),
    ('121', '90', '2012-03-01'),
    ('122', '80', '2012-03-01'),
    ('123', '85', '2012-03-01'),
    ('124', '94', '2012-03-01'),
    ('125', '89', '2012-03-01'),
    ('126', '97', '2012-03-01'),
    ('127', '94', '2012-03-01'),
    ('128', '72', '2012-03-01'),
    ('129', '88', '2012-03-01'),
    ('130', '91', '2012-03-01')

这是我的一个内部查询,显示排序不起作用:

SELECT
    CourseDate,
    Grade,
    ROW_NUMBER() OVER (
        PARTITION BY LEFT(CourseDate, 4)
        ORDER BY Grade ASC) AS RowAsc,
    ROW_NUMBER() OVER (
        PARTITION BY LEFT(CourseDate, 4)
        ORDER BY Grade DESC) AS RowDesc
FROM EStats

请注意,对于 CourseDate 2010-03-01,RowAsc 执行以下操作:

10
9
8
6
7
5
3
4
2
1

但是,我需要所有行都有一个按顺序排列的数字,以便在存在偶数数量的情况下计算中位数。(Rank 和 dense_rank 不起作用,因为它们留下了“差距”)。

实际上,下面是整个事情。同样,我正在尝试计算 blox 绘图图表的中位数、下四分位数、上四分位数、最小值和最大值。非常感谢任何帮助!

WITH Q3 AS
(
    SELECT
        CourseDate,
        AVG(CAST(Grade AS Numeric)) AS Median

    FROM
    (
        SELECT
            CourseDate,
            Grade,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(CourseDate, 4)
                ORDER BY Grade ASC) AS RowAsc,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(CourseDate, 4)
                ORDER BY Grade DESC) AS RowDesc
        FROM EStats
    )x
    WHERE 
        RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
    GROUP BY CourseDate
    --ORDER BY CourseDate
),

Q2 AS
(
    SELECT
        x.CourseDate,
        AVG(CAST(Grade AS Numeric)) AS LowerQuartile

    FROM
    (
        SELECT
            Estats.CourseDate,
            Estats.Grade,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(EStats.CourseDate, 4)
                ORDER BY Grade ASC) AS RowAsc,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(Estats.CourseDate, 4)
                ORDER BY Grade DESC) AS RowDesc
        FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
        WHERE EStats.Grade < Q3.Median 
    )x
    WHERE
        RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
    GROUP BY x.CourseDate
),

Q4 AS
(
    SELECT
        x.CourseDate,
        AVG(CAST(Grade AS Numeric)) AS UpperQuartile

    FROM
    (
        SELECT
            Estats.CourseDate,
            Estats.Grade,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(EStats.CourseDate, 4)
                ORDER BY Grade ASC) AS RowAsc,
            ROW_NUMBER() OVER (
                PARTITION BY LEFT(Estats.CourseDate, 4)
                ORDER BY Grade DESC) AS RowDesc
        FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
        WHERE EStats.Grade > Q3.Median 
    )x
    WHERE
        RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
    GROUP BY x.CourseDate
)

SELECT Q3.CourseDate, Q3.Median AS Median, Q2.LowerQuartile, Q4.UpperQuartile, MIN(EStats.Grade) AS Min, MAX(EStats.Grade) AS Max
FROM Q3
    JOIN Q2 ON Q3.CourseDate = Q2.CourseDate
    JOIN Q4 ON Q3.CourseDate = Q4.CourseDate
    JOIN EStats ON Q3.CourseDate = EStats.CourseDate
GROUP BY Q3.CourseDate, Q3.Median, Q2.LowerQuartile, Q4.UpperQuartile
ORDER BY Q3.CourseDate
4

1 回答 1

0

试试这个来获得中位数:

select avg(case when seqnum*2 = totnum+1 then col
                when seqnum*2 in (totnum, totnum + 2) then col
            end)
from (select t.*, row_number() over (order by col) as seqnum,
             count(*) over () as totnum
      from t
     ) t

它看起来很神秘,但想法是对偶数做你想要的平均值,对其他数字做一个单一的值。如果使用 SQL Server,请记住它使用整数除法。您实际上可以将上述简化为:

select avg(case when seqnum*2 in (totnum, totnum+1, totnum+2) then col end)

这是有效的,因为奇数总 cnt 只匹配totnum+1,偶数匹配其他两个值。

于 2013-01-23T00:33:07.083 回答