5

这应该很简单,但作为 SQL 的新手,我真的很挣扎。有人建议我将 PERCENTILE_CONT 与连续(非离散)数据一起使用。

有问题的数据涉及两列:(1)患者列表的 ID 和(2)每年的平均事件数。

根据我在网上找到的一些代码工作,这就是我要做的

SELECT ID,
percentile_cont (0.25) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_25,
percentile_cont (0.50) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_50,
percentile_cont (0.75) WITHIN GROUP
(ORDER BY PPPY ASC) OVER(PARTITION BY ID) as percentile_75
FROM AE_COUNT;

这似乎只是报告了每个具有相同 PPPY 值的列。

知道我哪里出错了吗?

4

2 回答 2

2

假设您想获取整个表格的百分位数,请尝试以下操作:

SELECT Distinct
percentile_cont (0.25) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_25,
percentile_cont (0.50) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_50,
percentile_cont (0.75) WITHIN GROUP
(ORDER BY PPPY ASC) OVER() as percentile_75
FROM AE_COUNT;

删除分区语句将针对整个表运行它。我还从 select 语句中删除了 Id 列并使其与众不同。

我还想指出,您说第二列是每年的平均事件数。我不知道您需要百分位数的用途,但请注意,计算一组集合的平均值的百分位数不会产生与计算集合并集的百分位数相同的结果。

于 2018-07-26T16:16:53.617 回答
1

PERCENTILE_CONT()是窗口函数或聚合函数。如果您想要汇总所有数据的单行,请将其用作聚合函数:

SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY PPPY ASC)  as percentile_25,
       percentile_cont(0.50) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_50,
       percentile_cont(0.75) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_75
FROM AE_COUNT;

如果您想要每位患者的价值,您可以:

SELECT id, percentile_cont(0.25) WITHIN GROUP (ORDER BY PPPY ASC)  as percentile_25,
       percentile_cont(0.50) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_50,
       percentile_cont(0.75) WITHIN GROUP (ORDER BY PPPY ASC) as percentile_75
FROM AE_COUNT
GROUP BY id;

但是,患者可能只有很少的行,因此对于任何给定患者,这些值可能相同。

于 2018-07-26T16:18:53.180 回答