2

我生成了以下结果集

"degree_easy","degree_hard","easy_percent","hard_percent"
1,5,0.166667,0.833333
1,5,0.166667,0.833333
1,6,0.142857,0.857143
1,8,0.111111,0.888889

以上结果集是从以下查询生成的:

select * from (
    select degree_one as degree_easy, 
        (degree_two + degree_three) as degree_hard,
        (degree_one::real/(degree_one::real + degree_two::real + degree_three::real)) 
            as easy_percent, 
        ((degree_two::real + degree_three::real)/(degree_one::real + degree_two::real +
            degree_three::real)) as hard_percent FROM recommendation_degree
    ) as dc 
where dc.degree_easy >= 1 and dc.degree_hard >= 1
order by dc.easy_percent ASC, dc.hard_percent ASC

现在我要做的是计算百分位数:

我不确定上面的哪一列更有意义,但假设我想使用 degree_easy 和 degree_hard 来计算百分位数或至少其中一个 如何ntile在 postgres 中使用函数来做到这一点?

执行以下操作的最佳做​​法是什么:

percentile, number_of_users
25, 4
50, 10
75, 20
99, 20
4

1 回答 1

3

ntile可以判断您是否在有序列表的最后 25% 中。但它不支持权重。为了ntile工作,所有组的大小必须相等。

sum ... over您可以使用分析函数计算权重。运行总和(等于或低于当前行的所有行的总和)为:

sum(col1) over (order by col1)

整个表格的总和为:

sum(col1) over ()

您可以通过将运行总和与总和进行比较来计算百分位数。一个简化的例子:

create table people (id serial, points int);
-- 3 people with 1 point, 2 people with 2 points, 1 person with 3 points
-- total 6 people and 10 points
insert into people (points) values (1), (1), (1), (2), (2), (3);

select  *
,       case 
        when sum(points) over (order by points) > 0.75 * sum(points) over () then '100%'
        when sum(points) over (order by points) > 0.5 * sum(points) over () then '75%'
        when sum(points) over (order by points) > 0.25 * sum(points) over () then '50%'
        else '25%'
        end as Percentile
from    people

哪个打印:

ID    POINTS  PERCENTILE
1     1       50%
2     1       50%
3     1       50%
4     2       75%
5     2       75%
6     3       100%

得 1 分的人加起来有 3 分,占总数的 30%。这使他们处于 50% 的百分位。得到 2 分的人使总数达到 7,使他们进入前 75%。得 3 分的人使总分达到 10 分,将他排在首位。

SQL Fiddle 的示例。

于 2014-08-21T18:46:30.603 回答