sql - 用于连续分组的 Oracle SQL

Question

我需要从具有如下结构和数据的表中生成报告。

Table Ticket 的数据如下所示。

ID         Assigned_To
100       raju
101       raju
102       raju
103       anil
104       anil
105       sam
106       raju
107       raju
108       anil

OracleSELECT应生成以下报告

From_Id            To_Id    Assigned_To
100                  102      raju
103                  104      anil
105                  105      sam
106                  107      raju
108                  108      anil

有人可以帮我建立一个查询..吗？

在此先感谢，马修。

score 7 · Accepted Answer

SQL> create table ticket (id,assigned_to)
  2  as
  3  select 100, 'raju' from dual union all
  4  select 101, 'raju' from dual union all
  5  select 102, 'raju' from dual union all
  6  select 103, 'anil' from dual union all
  7  select 104, 'anil' from dual union all
  8  select 105, 'sam'  from dual union all
  9  select 106, 'raju' from dual union all
 10  select 107, 'raju' from dual union all
 11  select 108, 'anil' from dual
 12  /

Tabel is aangemaakt.

SQL> select min(id) from_id
  2       , max(id) to_id
  3       , assigned_to
  4    from ( select id
  5                , assigned_to
  6                , id - row_number() over (partition by assigned_to order by id) grp
  7             from ticket
  8         )
  9   group by assigned_to
 10       , grp
 11   order by from_id
 12  /

   FROM_ID      TO_ID ASSIGNED_TO
---------- ---------- -----------
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil

5 rijen zijn geselecteerd.

**更新与 tuinstoel 解决方案的性能比较结果：

在 11.1.0.7 上：

SQL> exec runstats_pkg.rs_start

PL/SQL procedure successfully completed.

SQL> set termout off
SQL> select min(id) from_id
  2       , max(id) to_id
  3       , assigned_to
  4    from ( select id
  5                , assigned_to
  6                , id - row_number() over (partition by assigned_to order by id) grp
  7             from ticket
  8         )
  9   group by assigned_to
 10       , grp
 11   order by from_id
 12  /

   FROM_ID      TO_ID ASSI
---------- ---------- ----
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil
       109        111 raju
<snip>
    589921     589922 raju
    589923     589923 anil

327680 rows selected.

SQL> set termout on
SQL> exec runstats_pkg.rs_middle

PL/SQL procedure successfully completed.

SQL> set termout off
SQL> select * from table(testpl.pltest)
  2  /

   FROM_ID      TO_ID ASSI
---------- ---------- ----
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil
       109        111 raju
<snip>
    589921     589922 raju
    589923     589923 anil

327680 rows selected.

SQL> set termout on

结果：

SQL> exec runstats_pkg.rs_stop(100)
Run1 draaide in 547 hsecs
Run2 draaide in 549 hsecs
Run1 draaide in 99.64% van de tijd

Naam                                                      Run1        Run2    Verschil
STAT.recursive cpu usage                                     2         106         104
LATCH.row cache objects                                     91         217         126
STAT.bytes received via SQL*Net from client             37,496      37,256        -240
STAT.recursive calls                                         7       5,914       5,907
STAT.table scan rows gotten                            615,235     589,824     -25,411
STAT.sorts (rows)                                      917,504     589,824    -327,680

Run1 latches totaal versus run2 -- verschil en percentage
Run1      Run2  Verschil     Pct
10,255    10,471       216  97.94%

PL/SQL procedure successfully completed.

问候，罗布。

score 3 · Accepted Answer

假设来自 NZSG 的 Andrew 启发了我。我也做了一个管道内衬功能。

create or replace package testpl is

 type outrec_type is record
 ( from_id ticket.id%type
 , to_id   ticket.id%type
 , assigned_to ticket.assigned_to%type);

 type outrec_table is table of outrec_type;

 function pltest return outrec_table pipelined;

end;
/

create or replace package body testpl is

  function pltest return outrec_table pipelined
  is
    l_outrec outrec_type;
    l_first_time boolean := true;
  begin

     for r_tick in (select id, assigned_to from ticket order by id) loop

       if (r_tick.assigned_to != l_outrec.assigned_to or l_first_time) then
          if not l_first_time then
            pipe row (l_outrec);
          else
            l_first_time := false;
          end if;
          l_outrec.assigned_to := r_tick.assigned_to;
          l_outrec.from_id := r_tick.id;
       end if;
       l_outrec.to_id := r_tick.id;
    end loop;

    pipe row (l_outrec);

    return;
  end;

end;
/

您可以使用以下方法对其进行测试：

select * from table(testpl.pltest);

在我的 Windows XP Oracle 11.1.0.6.0 系统上，它的速度大约是 Rob van Wijk 解决方案的两倍。

这

for r_tick in (select ....) loop
  ....
end loop;

构造在 Oracle 10 和 11 中具有非常不错的性能。大多数情况下，仅 SQL 的解决方案更快，但我认为这里的 PL/SQL 更快。

score 2 · Accepted Answer

这应该有效。该解决方案由几个内联视图组成 - 每个视图都计算一些东西。我评论了我打算做的事情。您当然必须在执行时从最里面阅读评论。

--get results by grouping by interval_begin
SELECT MIN(id) from_id,
       MAX(id) to_id,
       MAX(assigned_to) assigned_to
  FROM ( --copy ids of a first row of each interval of ids to the all following rows of that interval
         SELECT id,
                 assigned_to,
                 MAX(change_at) over(ORDER BY id) interval_begin
           FROM ( --find each id where a change of an assignee occurs and "mark" it. Dont forget the first row
                 SELECT id,
                         assigned_to,
                         CASE
                           WHEN (lag(assigned_to) over(ORDER BY id) <> assigned_to OR lag(assigned_to)
                                 over(ORDER BY id) IS NULL) THEN
                            id
                         END change_at
                   FROM ticket))
 GROUP BY interval_begin
 ORDER BY from_id;
    ;

score 2 · Accepted Answer

您可以向后弯腰尝试在纯 SQL 中实现这一点，或者您可以创建一些更长但更容易理解和更高效的东西 - 使用流水线函数

本质上，该函数将接受一个必须按 ID 预先排序的 ref 游标，然后仅在连续的记录块结束时才对行进行管道传输。

CREATE TABLE assignment
(
  a_id         NUMBER,
  assigned_to  VARCHAR2(4000)
);

CREATE OR REPLACE package PCK_CONTIGUOUS_GROUPBY as

 TYPE refcur_t IS REF CURSOR RETURN assignment%ROWTYPE;

 TYPE outrec_typ IS RECORD ( 
    from_id    NUMBER,
    to_id      NUMBER,
    assigned_to  VARCHAR2(4000));

  TYPE outrecset IS TABLE OF outrec_typ;

 FUNCTION f_cont_groupby(p refcur_t) 
      RETURN outrecset PIPELINED;

end;
/

CREATE OR REPLACE package body pck_contiguous_groupby as

 FUNCTION f_cont_groupby(p refcur_t) RETURN outrecset PIPELINED IS

  out_rec             outrec_typ;
  in_rec              p%ROWTYPE;
  first_id            assignment.a_id%type;
  last_id             assignment.a_id%type;
  last_assigned_to    assignment.assigned_to%type;

  BEGIN

   LOOP
     FETCH p INTO in_rec;
     EXIT WHEN p%NOTFOUND;


       IF last_id IS NULL THEN
       -- First record: don't pipe
         first_id := in_rec.a_id;

       ELSIF last_id = in_rec.a_id - 1 AND last_assigned_to = in_rec.assigned_to THEN
       -- Contiguous block: don't pipe
         NULL;

       ELSE
       -- Block not contiguous: pipe 
         out_rec.from_id := first_id;
         out_rec.to_id := last_id;
         out_rec.assigned_to := last_assigned_to;

         PIPE ROW(out_rec);

         first_id := in_rec.a_id;
       END IF;

     last_id := in_rec.a_id;
     last_assigned_to := in_rec.assigned_to;

   END LOOP;
   CLOSE p;

   -- Pipe remaining row 
   out_rec.from_id := first_id;
   out_rec.to_id := last_id;
   out_rec.assigned_to := last_assigned_to;

   PIPE ROW(out_rec);

   RETURN;
 END;

END pck_contiguous_groupby;
/

然后尝试一下，填充表格并运行：

SELECT * FROM TABLE(pck_contiguous_groupby.f_cont_groupby (CURSOR (SELECT a_id, assigned_to FROM assignment ORDER BY a_id)));

score 2 · Accepted Answer

好的，这不漂亮，但它有效。没有其他人贡献过任何更漂亮的东西，所以也许这就是这样做的方式。

select min(from_id), to_id, assigned_to from
(
select from_id, max(to_id) as to_id, assigned_to from
(
select t1.id as from_id, t2.id as to_id, t1.assigned_to
from       ticket t1
inner join ticket t2 on t1.assigned_to = t2.assigned_to and t2.id >= t1.id
where not exists
    (
    select * from ticket t3
    where  t3.ID > t1.ID
    and t3.ID < t2.ID
    and t3.assigned_to != t1.assigned_to
    )
) x
group by from_id, assigned_to
) y
group by to_id, assigned_to
;

我正在使用mysql；很可能有一些 oracle 的优点使它变得更好——因为很可能有一些更优雅的普通 sql。但至少这是一个开始。

score 2 · Accepted Answer

我用 Oracle express edition 10.2.0.1.0 做了一些基准测试

我用这个脚本用 1179648 行填充表票：

create table ticket (id,assigned_to)
as
select 100, 'raju' from dual union all
select 101, 'raju' from dual union all
select 102, 'raju' from dual union all
select 103, 'anil' from dual union all
select 104, 'anil' from dual union all
select 105, 'sam' from dual union all
select 106, 'raju' from dual union all
select 107, 'raju' from dual union all
select 108, 'anil' from dual
/


begin
  for i in 1..17 loop
    insert into ticket 
    select id + (select count(*) from ticket), assigned_to
    from ticket;
  end loop;
end;
/

commit;

SQL> select count(*) from ticket;

  COUNT(*)                                                                      
----------                                                                      
   1179648

Rob van Wijk 的 select 语句平均耗时 1.6 秒，Mesays 的 select 语句平均耗时 2.8 秒，Micheal Pravda 的 select 语句平均耗时 4.2 秒，来自 NZSG 的 Andrew 的语句平均耗时 9.6 秒。

因此，Oracle XE 中的流水线功能较慢。或者也许有人必须改进流水线功能......？

score 1 · Accepted Answer

这是我的建议，没有经过很好的测试，但在我看来，这听起来是对的，只要 id 是唯一的并且是连续的序列。查看底部的 sql 查询。

SQL> create table ticket (id number, assigned_to varchar2(30));

Table created.

SQL> insert into ticket values (100, 'raju');

1 row created.

SQL> insert into ticket values (101, 'raju');

1 row created.

SQL> insert into ticket values (102, 'raju');

1 row created.

SQL> insert into ticket values (103, 'anil');

1 row created.

SQL> insert into ticket values (104, 'anil');

1 row created.

SQL> insert into ticket values (105, 'sam');

1 row created.

SQL> insert into ticket values (106, 'raju');

1 row created.

SQL> insert into ticket values (107, 'raju');

1 row created.

SQL> insert into ticket values (108, 'anil');

1 row created.

SQL> select a.id from_id
  2  ,lead(a.id -1, 1, a.id) over (order by a.id) to_id
  3  ,a.assigned_to
  4  from (
  5  select
  6  id, assigned_to
  7  ,lag(assigned_to, 1) over (order by id) prev_assigned_to
  8  from ticket
  9  ) a
 10  where a.assigned_to != nvl(a.prev_assigned_to, a.assigned_to||'unique')
 11  order by id
 12  ;

   FROM_ID      TO_ID ASSIGNED_TO
---------- ---------- ------------------------------
       100        102 raju
       103        104 anil
       105        105 sam
       106        107 raju
       108        108 anil

score 0 · Accepted Answer

我相信您想要Oracle 分析功能之一。如果您使用 Oracle，那么您很幸运，因为其他 RDBMS 没有此功能。它们允许您编写 SQL 来查询与相邻行相关的数据，例如计算移动平均值。我这里没有可供使用的 Oracle DB，但我认为它会是这样的：

SELECT MIN(ID) AS From_Id, MAX(ID) AS To_Id, Assigned_To
FROM Ticket
PARTITION BY Assigned_To
ORDER BY From_Id

sql - 用于连续分组的 Oracle SQL

8 回答 8

Related

Reference