sql - sql 正则表达式解析文本以添加新行

Question

我正在尝试使用一个只是一大块文本的注释字段，示例数据如下，就好像我将它插入到表格中一样。

create table test_table
(
job_number number,
notes varchar2(4000)
)

insert into test_table (job_number,notes)
values (12345,1022089483 notes notes notes notes 1022094450 notes notes notes notes 1022095218 notes notes notes notes)

我需要将其解析出来，以便每个注释条目都有单独的记录（注释前的 10 位数字是 unix 时间戳）。所以如果我要导出到管道分隔它看起来像这样：

工作编号|注释

12345|1022089483 注释注释注释注释

12345|1022094450 注释注释注释注释

12345|1022095218 备注备注备注

我真的希望这是有道理的。我很欣赏任何见解。

score 0 · Accepted Answer

这样做的几种方法：

SQL> insert into test_table (job_number,notes)
  2  values (12345,'1022089483 notes notes notes notes 1022094450 notes notes notes notes 1022095218 notes notes notes notes');

1 row created.

SQL> insert into test_table (job_number,notes)
  2  values (12346,'1022089483 notes notes notes notes 1022094450 foo 1022095218 test notes 1022493228 the answer is 42');

1 row created.

SQL> commit;

Commit complete.

注意：我使用[0-9]{10}正则表达式来确定注释（即任何 10 位数字都被视为注释的开头）。

首先，我们可以采用计算任何给定行中的最大音符数的方法，然后对该行数进行笛卡尔连接。然后过滤掉每个音符：

SQL> with data
  2  as (select job_number, notes,
  3            (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes
  4        from test_table t)
  5  select job_number,
  6         substr(d.notes, regexp_instr(d.notes, '[0-9]{10}', 1, rn.l),
  7                       regexp_instr(d.notes||' 0000000000', '[0-9]{10}', 1, rn.l+1)
  8                       -regexp_instr(d.notes, '[0-9]{10}', 1, rn.l) -1
  9               ) note
 10    from data d
 11         cross join (select rownum l
 12                      from dual
 13                    connect by level <= (select max(num_of_notes)
 14                                           from data)) rn
 15   where rn.l <= d.num_of_notes
 16   order by job_number, rn.l;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

7 rows selected.

只要音符的数量通常相同就可以（差异越大，缩放越差，因为我们正在进行大量递归查找）。

在 11g 中，我们可以使用递归分解子查询来做与上面相同的事情，但不做额外的循环：

SQL> with data (job_number, notes, note, num_of_notes, iter)
  2  as (select job_number, notes,
  3             substr(notes, regexp_instr(notes, '[0-9]{10}', 1, 1),
  4                    regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, 2)
  5                    -regexp_instr(notes, '[0-9]{10}', 1, 1) -1
  6                  ),
  7             (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes,
  8             1
  9        from test_table
 10      union all
 11     select job_number, notes,
 12             substr(notes, regexp_instr(notes, '[0-9]{10}', 1, iter+1),
 13                    regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, iter+2)
 14                    -regexp_instr(notes, '[0-9]{10}', 1, iter+1) -1
 15                  ),
 16             num_of_notes, iter + 1
 17       from data
 18      where substr(notes, regexp_instr(notes, '[0-9]{10}', 1, iter+1),
 19                    regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, iter+2)
 20                    -regexp_instr(notes, '[0-9]{10}', 1, iter+1) -1
 21                  ) is not null
 22    )
 23  select job_number, note
 24    from data
 25  order by job_number, iter;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

7 rows selected.

或者从 10g 开始，我们可以使用模型子句来组成行：

SQL> with data as (select job_number, notes,
  2                       (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes
  3                  from test_table)
  4  select job_number, note
  5    from data
  6  model
  7  partition by (job_number)
  8  dimension by (1 as i)
  9  measures (notes, num_of_notes, cast(null as varchar2(4000)) note)
 10  rules
 11  (
 12    note[for i from 1 to num_of_notes[1] increment 1]
 13      = substr(notes[1],
 14               regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)),
 15               regexp_instr(notes[1]||' 0000000000', '[0-9]{10}', 1, cv(i)+1)
 16               -regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)) -1
 17              )
 18  )
 19  order by job_number, i;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

sql - sql 正则表达式解析文本以添加新行

1 回答 1

Related

Reference