sql - How to Dense Rank Sets of data

Question

I am trying to get a dense rank to group sets of data together. In my table I have ID, GRP_SET, SUB_SET, and INTERVAL which simply represents a date field. When records are inserted using an ID they get inserted as GRP_SETs of 3 rows shown as a SUB_SET. As you can see when inserts happen the interval can change slightly before it finishes inserting the set.

Here is some example data and the DRANK column represents what ranking I'm trying to get.

with q as (
select 1 id, 'a' GRP_SET, 1 as SUB_SET, 123 as interval, 1 as DRANK from dual union all
select 1, 'a', 2, 123, 1 from dual union all
select 1, 'a', 3, 124, 1 from dual union all
select 1, 'b', 1, 234, 2 from dual union all
select 1, 'b', 2, 235, 2 from dual union all
select 1, 'b', 3, 235, 2 from dual union all
select 1, 'a', 1, 331, 3 from dual union all
select 1, 'a', 2, 331, 3 from dual union all
select 1, 'a', 3, 331, 3 from dual)

select * from q

Example Data

ID GRP_SET SUBSET INTERVAL DRANK
1  a       1      123      1
1  a       2      123      1
1  a       3      124      1
1  b       1      234      2
1  b       3      235      2
1  b       2      235      2
1  a       1      331      3
1  a       2      331      3
1  a       3      331      3

Here is the query I Have that gets close but I seem to need something like a:

Partition By: ID
Order within partition by: ID, Interval
Change Rank when: ID, GRP_SET (change)

select
   id, GRP_SET, SUB_SET, interval,
   DENSE_RANK() over (partition by ID order by id, GRP_SET) as DRANK_TEST
from q
Order by
   id, interval

score 2 · Accepted Answer

使用`MODEL`子句

看哪，您正在推动您的需求超出了“普通”SQL 中易于表达的限制。但幸运的是，您使用的是 Oracle，它具有该MODEL子句，它的神秘之处仅在于它的强大功能（这里是优秀的白皮书）。你应该写：

SELECT
   id, grp_set, sub_set, interval, drank
FROM (
  SELECT id, grp_set, sub_set, interval, 1 drank
  FROM q
)
MODEL PARTITION BY (id)
      DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
      MEASURES (grp_set, sub_set, interval, drank)
      RULES (
        drank[any] = NVL(drank[cv(rn) - 1] + 
                         DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
      )

SQLFiddle 上的证明

解释：

SELECT
   id, grp_set, sub_set, interval, drank
FROM (
  -- Here, we initialise your "dense rank" to 1
  SELECT id, grp_set, sub_set, interval, 1 drank
  FROM q
)

-- Then we partition the data set by ID (that's your requirement)
MODEL PARTITION BY (id)

-- We generate row numbers for all columns ordered by interval and sub_set,
-- such that we can then access row numbers in that particular order
      DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)

-- These are the columns that we want to generate from the MODEL clause
      MEASURES (grp_set, sub_set, interval, drank)

-- And the rules are simple: Each "dense rank" value is equal to the
-- previous "dense rank" value + 1, if the grp_set value has changed
      RULES (
        drank[any] = NVL(drank[cv(rn) - 1] + 
                         DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
      )

当然，这只有在没有交错事件的情况下才有效，即只有grp_set在a123 和 124 之间

score 2 · Accepted Answer

这可能对你有用。复杂的因素是您希望间隔123和124以及间隔234和具有相同的“DENSE RANK” 235。因此，为了对函数进行排序，我们会将它们截断到最接近的 10 DENSE_RANK()：

SELECT id, grp_set, sub_set, interval, drank
     , DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval, -1), grp_set ) AS drank_test
  FROM q

请在此处查看 SQL Fiddle 演示。

如果您希望间隔更接近以便组合在一起，那么您可以在截断之前乘以该值。这会将它们按 3 分组（但也许你不需要它们那么细化）：

SELECT id, grp_set, sub_set, interval, drank
     , DENSE_RANK() OVER ( PARTITION BY id ORDER BY TRUNC(interval*10/3, -1), grp_set ) AS drank_test
  FROM q

sql - How to Dense Rank Sets of data

2 回答 2

使用MODEL子句

解释：

Related

Reference

使用`MODEL`子句