0

我有两列包含规则和日期列表,如下所示:

 a                         b                                            c
---------------           -----------------------------     
init, rule#062,rule#066   20210417124104,20210417132843,20210419132843  user1
init, rule#062            20210417124104,20210417132843                 user2
init                      20210417124104                                user3

预期输出:

a            b                c
init        20210417124104    user1
rule#062    20210417124104    user1
rule#066    20210419132843    user1
init        20210417124104    user2
rule#062    20210417124104    user2
init       20210417124104     user3

我需要将一行更改为列中的项目数。

列表中的项目数可以不同,也不必两个。我必须在 exasol db 中运行它,所以并非所有功能都在那里工作。

谢谢 。寻求支持。

已编辑

我能够通过下面的答案为一个用户获得正确的结果,但是当我添加两个用户时,答案集会成倍增加,我认为连接依据和级别存在一些我不太了解的逻辑。

行重复,我如何从每个新用户重新开始

                       SELECT
                                u.master_user_id ,
                                u.user_id ,
                                
                                SUBSTR(regexp_substr(u.CONS_DATE_HIST, '[^,]+', 1, level), 1, 8) as date_id ,
                                
                                CASE
                                        WHEN LOWER(trim(regexp_substr(u.CONS_RULES_HIST, '[^,]+', 1, level))) = 'init'
                                        THEN 'init'
                                        ELSE SUBSTR(trim(regexp_substr(u.CONS_RULES_HIST, '[^,]+', 1, level)), 6)
                                END AS rule_nbr,
                                
                                level lvl 
                                
                              --  row_number() over(partition by master_user_id, level order by user_id) as rn
                                FROM
                                (
                                                SELECT client_id,
                                                master_user_id,
                                                user_id,
                                                CONS_DATE_HIST,
                                                CONS_RULES_HIST
                                                FROM
                                                ECOMBI_CL_0001100.users
                                                WHERE
                                                cast(load_date as date) > current_date - 4
                                                and user_id in (38043958 )
                                ) u 
                               
                                connect by   regexp_substr(u.CONS_DATE_HIST,  '[^,]+', 1, level) <> 'null' 
                                and          regexp_substr(u.CONS_RULES_HIST, '[^,]+', 1, level) <> 'null'
                               -- and          user_id<> user_id
                              --  and             user_id <> user_id
                               -- and             row_number() over(partition by master_user_id, level order by user_id) <> 2
                                
                       order by 2,4

MASTER_USER_ID  USER_ID DATE_ID RULE_NBR    LVL

37175   38043958    20211024    init    1
37175   38043958    20211024    035 2
37175   38043958    20211024    064 3
37175   38043958    20211025    035 4
37175   38043958    20211025    060 5

有两个用户:

MASTER_USER_ID  USER_ID DATE_ID RULE_NBR    LVL
37175   38043958    20211024    035 2
37175   38043958    20211024    035 2
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    035 4
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211025    060 5
37175   38043958    20211024    064 3
37175   38043958    20211024    064 3
37175   38043958    20211024    064 3
37175   38043958    20211024    064 3
37175   38043958    20211024    init    1
968389  38052591    20211024    012 2
968389  38052591    20211024    012 2
968389  38052591    20211024    060 3
968389  38052591    20211024    060 3
968389  38052591    20211024    060 3
968389  38052591    20211024    060 3
968389  38052591    20211024    init    1

请问有什么帮助吗?

4

2 回答 2

0

好的,我试了一下可能会得到改进:

with data(a,b,c) as (
select 'init, rule#062,rule#066'  , '20210417124104,20210417132843,20210419132843' , 'user1'  from dual union all
select 'init, rule#062' , '20210417124104,20210417132843', 'user2' from dual union all
select 'init' , '20210417124104', 'user3' from dual 

)
,at as (
select * from (
select distinct regexp_substr(data.a,'[^,]+', 1, level) a ,c,level lvl from data
connect by regexp_substr(data.a,'[^,]+', 1, level) is not null
))
,bt as (
select * from (
select distinct regexp_substr(data.b,'[^,]+', 1, level) b ,c,level lvl from data
connect by regexp_substr(data.b,'[^,]+', 1, level) is not null
))
select at.a,bt.b,bt.c
from at
join bt on at.lvl = bt.lvl
and at.c = bt.c
order by bt.c,bt.b
;

我注意到当我完成后你想要一个 ms-sql 的解决方案这是用于 oracle SQL 的,希望它可以提供一些帮助......

于 2021-10-25T11:14:11.393 回答
0

这是UDF(用户定义函数)的一个很好的用例。我们想从单行创建多行,因此我们需要使用EMITSudf。如果您只想转换单行,SCALAR则 udf 就足够了。我选择使用 Lua,因为它的开销最小,但您也可以用 Python、Java 或 R 编写逻辑。有关支持的语言的更多信息,请参见此处

在下面的示例中,我将列表拆分为逗号以及单个空格,如果您真的只想拆分逗号'([^,]+)'用作正则表达式。此示例假定两个列表具有相同数量的元素。

create schema s;

create or replace table T as values ('init, rule#062,rule#066', '20210417124104,20210417132843,20210419132843' , 'user1'),
('init, rule#062' , '20210417124104,20210417132843', 'user2'),
('init' , '20210417124104', 'user3') as T(a, b, c) ;


--/
CREATE OR REPLACE LUA SCALAR SCRIPT split_lists(first_list varchar(2000000), second_list VARCHAR(2000000)) 
EMITS(first_list varchar(2000000), second_list varchar(2000000)) AS 
function run(ctx)
 first_list = ctx.first_list
 second_list = ctx.second_list
 
 local first_split = {}
 local second_split = {}
  
 for word in string.gmatch(first_list, '([^, ]+)') do
    first_split[#first_split + 1] = word
 end
 for word in string.gmatch(second_list, '([^, ]+)') do
    second_split[#second_split + 1] = word
 end
 
 for i = 1,#first_split do
   ctx.emit(first_split[i], second_split[i])
 end
end
/

select split_lists(a,b),c from t;

-- Result:
/*
FIRST_LIST SECOND_LIST    C     
---------- -------------- ----- 
init       20210417124104 user1 
rule#062   20210417132843 user1 
rule#066   20210419132843 user1 
init       20210417124104 user2 
rule#062   20210417132843 user2 
init       20210417124104 user3 
*/
于 2021-10-27T09:34:19.797 回答