sql - Oracle SQL 顺序组号分配

Question

我有一个结果集，为简单起见，我将其称为三列的表“选项卡”：类别、子类别和日期，按类别排序，然后按日期排序。该数据集是一个网格，我希望在该网格之上执行其他处理。我的问题是在数据集中唯一标识（或按顺序标记）组。下面的 SQL 是我所追求的（GID1 或 GID2 都可以），基于前 3 列的存在。我尝试了 group_id、grouping_id、rank、dense_rank 并且要么错过了其中一个技巧，要么正在尝试一些非常尴尬的事情。GID 的顺序并不重要，重要的是组号分配基于排序的数据（类别然后日期）。

 CREATE TABLE Tab
        ("Category" varchar2(1), "SubCategory" varchar2(7), "Date" int, "GID1" int, "GID2" int);

    INSERT ALL 
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'bannana', 20120101, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'grape', 20120102, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120103, 1, 1)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120104, 1, 1)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'bannana', 20120105, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120106, 2, 2)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120107, 2, 2)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'apple', 20120108, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('A', 'pear', 20120109, 3, 3)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'apple', 20120101, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'bannana', 20120102, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'apple', 20120103, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'bannana', 20120104, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120105, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120106, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120107, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120108, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('B', 'pear', 20120109, 1, 4)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'grape', 20120101, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'grape', 20120102, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120103, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'bannana', 20120104, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'grape', 20120105, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'pear', 20120106, 1, 5)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120107, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120108, NULL, NULL)
        INTO Tab ("Category", "SubCategory", "Date", "GID1", "GID2")
             VALUES ('C', 'apple', 20120109, NULL, NULL)
    SELECT * FROM dual
    ;

score 4 · Accepted Answer

好的，如果它只是梨那么：

SQL> select "Category", "SubCategory", "Date",
  2         case
  3           when "SubCategory" = 'pear'
  4           then
  5             count(rn) over (partition by "Category" order by "Date") else null
  6         end GID1 ,
  7         case
  8           when "SubCategory" = 'pear'
  9           then
 10             count(rn) over ( order by "Category", "Date") else null
 11         end GID2
 12    from (select "Category", "SubCategory", "Date", lag("SubCategory") over (partition by "Category" order by "Date"),
 13                                    case
 14                                      when lag("SubCategory") over (partition by "Category" order by "Date") != "SubCategory"
 15                                      and "SubCategory" = 'pear'
 16                                       then 1
 17                                      when row_number() over (partition by "Category" order by "Date") = 1 and "SubCategory" = 'pear' then 1
 18                                      else null
 19                                    end rn
 20                               from tab)
 21   order by 1, 3;

Category   SubCate       Date       GID1       GID2
---------- ------- ---------- ---------- ----------
A          bannana   20120101
A          grape     20120102
A          pear      20120103          1          1
A          pear      20120104          1          1
A          bannana   20120105
A          pear      20120106          2          2
A          pear      20120107          2          2
A          apple     20120108
A          pear      20120109          3          3
B          apple     20120101
B          bannana   20120102
B          apple     20120103
B          bannana   20120104
B          pear      20120105          1          4
B          pear      20120106          1          4
B          pear      20120107          1          4
B          pear      20120108          1          4
B          pear      20120109          1          4
C          grape     20120101
C          grape     20120102
C          apple     20120103
C          bannana   20120104
C          grape     20120105
C          pear      20120106          1          5
C          apple     20120107
C          apple     20120108
C          apple     20120109

打破这个。

我们查看按“日期”排序的前一行（对于每个“类别”），看看它是否是不同的“子类别”以及当前类别 = pear。如果是这样，我们用“1”标记该行（与我们使用的内容无关，只是 NON NULL）。

lag("SubCategory") over (partition by "Category" order by "Date") != "SubCategory" 
 and "SubCategory" = 'pear'

对于第一行，我们也分配相同的值。这给了我们：

Category   SubCate       Date LAG("SU         RN
---------- ------- ---------- ------- ----------
A          bannana   20120101
A          grape     20120102 bannana
A          pear      20120103 grape            1
A          pear      20120104 pear
A          bannana   20120105 pear
A          pear      20120106 bannana          1
A          pear      20120107 pear
A          apple     20120108 pear
A          pear      20120109 apple            1
B          apple     20120101
B          bannana   20120102 apple
B          apple     20120103 bannana
B          bannana   20120104 apple
B          pear      20120105 bannana          1
B          pear      20120106 pear
B          pear      20120107 pear
B          pear      20120108 pear
B          pear      20120109 pear
C          grape     20120101
C          grape     20120102 grape
C          apple     20120103 grape
C          bannana   20120104 apple
C          grape     20120105 bannana
C          pear      20120106 grape            1
C          apple     20120107 pear
C          apple     20120108 apple
C          apple     20120109 apple

现在，只需我们 count() 再次在 Date 上排序的非空“RN”值（GID1 的每个类别，而不是 GID2 [gid2 我们也按它排序！）。这是这些行： count(rn) over (partition by "Category" order by "Date")（GID1）

和 count(rn) over ( order by "Category", "Date")(GID2)

score 0 · Accepted Answer

从来没有想过它可以用计数来完成。杰出的。从版本 11r2 开始，这可以通过使用递归分层查询来完成。

with r as ( 
  select "Category"
    , "SubCategory"
    , "Date"
    , row_number() over (partition by "SubCategory" order by "Category", "Date") rn
  from tab
)
, fwd ( "Category", "SubCategory", "Date", rn, GID1, GID2) as (
  select "Category"
    , "SubCategory"
    , "Date"
    , rn
    , 1
    , 1
  from r
  where rn = 1
  union all 
  select nxt."Category"
    , nxt."SubCategory"
    , nxt."Date"
    , nxt.rn
    , decode( nxt."Category"
      , prev."Category", decode( nxt."Date"
        , prev."Date" + 1, prev.gid1
        , prev.gid1 + 1 
      )
      , 1
    ) as gid1
    , decode( nxt."Date"
      , prev."Date" + 1, prev.gid2
      , prev.gid2 + 1 
    ) as gid2
  from fwd prev
    , r nxt
  where prev.rn + 1= nxt.rn
    and prev."SubCategory" = nxt."SubCategory"
)
select "Category"
  , "SubCategory"
  , "Date"
  , decode( "SubCategory", 'pear', GID1, null ) as gid1
  , decode( "SubCategory", 'pear', GID2, null ) as gid2
from fwd
order by "Category", "Date";

它产生相同的结果

Category SubCategory       Date       GID1       GID2
-------- ----------- ---------- ---------- ----------
A        bannana       20120101                       
A        grape         20120102                       
A        pear          20120103          1          1 
A        pear          20120104          1          1 
A        bannana       20120105                       
A        pear          20120106          2          2 
A        pear          20120107          2          2 
A        apple         20120108                       
A        pear          20120109          3          3 
B        apple         20120101                       
B        bannana       20120102                       
B        apple         20120103                       
B        bannana       20120104                       
B        pear          20120105          1          4 
B        pear          20120106          1          4 
B        pear          20120107          1          4 
B        pear          20120108          1          4 
B        pear          20120109          1          4 
C        grape         20120101                       
C        grape         20120102                       
C        apple         20120103                       
C        bannana       20120104                       
C        grape         20120105                       
C        pear          20120106          1          5 
C        apple         20120107                       
C        apple         20120108                       
C        apple         20120109

并且可以更自我解释。

decode如果您从最终选择中删除，它还会为所有其他子类别生成正确的 GID1 和 GID2 编号，而不仅仅是“梨” 。

在此变体和@DazzaL 提供的变体之间进行选择取决于成本比较

sql - Oracle SQL 顺序组号分配

2 回答 2

Related

Reference