我想 1)通过第二个标准偏差的下边界识别我的数据集中的差距,以及 2)评估给定组内剩余值的百分比份额的平均值/中值/标准差。
作为 Oracle SQL 的初学者/中级人员,我想从专家那里获得一些关于实现所描述目标的两种竞争查询方法的想法:
使用 Oracle 的“著名”模型子句:
select intv, avg, stddev, med from
(
select * from test
MODEL
PARTITION BY (INTV)
DIMENSION BY (ROWN, CASE WHEN (VAL < AVG(VAL) OVER (PARTITION BY INTV) - 2*STDDEV(VAL) OVER (PARTITION BY INTV)) THEN 1 ELSE 0 END flag)
MEASURES (VAL, val/sum(val) over (partition by rown) prt , 0 avg, 0 stddev, 0 med)
RULES
(
avg[0,0] = AVG(PRT)[ANY, flag<>1]
,stddev[0,0] = STDDEV(PRT)[ANY, flag<>1]
,med[0,0] = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY PRT)[ANY, flag<>1]
)
) where rown = 0
与标准分析查询:
select
intv
,avg(prt) avg
,stddev(prt) stddev
,percentile_cont(0.5) WITHIN GROUP (ORDER BY prt desc) med
from
(
select
a.*
, val/sum(val) over (partition by rown) prt
, case when avg(val) over (partition by intv) - 2*stddev(val) over (partition by intv) > val then 1 else 0 end flag
from
test a
)
where flag = 0
group by
intv
不幸的是,此时我无法访问我的大数据集,但通常计算这些平均值的表包含数百万行。我通过以下方式在小提琴中设置了我的小型数据库结构:
create table test (intv number, val number, rown number);
insert all
into test (intv, val, rown) values (1,5,1)
into test (intv, val, rown) values (1,4,2)
into test (intv, val, rown) values (1,4,3)
into test (intv, val, rown) values (1,5,4)
into test (intv, val, rown) values (1,6,5)
into test (intv, val, rown) values (1,2,6)
into test (intv, val, rown) values (1,5,7)
into test (intv, val, rown) values (1,4,8)
into test (intv, val, rown) values (1,5,9)
into test (intv, val, rown) values (2,10,1)
into test (intv, val, rown) values (2,12,2)
into test (intv, val, rown) values (2,13,3)
into test (intv, val, rown) values (2,15,4)
into test (intv, val, rown) values (2,13,5)
into test (intv, val, rown) values (2,12,6)
into test (intv, val, rown) values (2,19,7)
into test (intv, val, rown) values (2,18,8)
into test (intv, val, rown) values (2,13,9)
select * from dual;
您认为出于什么原因更有效?哪种方法比另一种方法有什么优势?
我正在寻找你的答案和最好的问候!