0
select distinct
 promo_name 
  ,case 
    when substring(promo_name,instr(promo_name, "P0"),2)  = "P0"   then 0
    when substring(promo_name,instr(promo_name, "P1"),2)  = "P1"   then 1
    When substring(promo_name,instr(promo_name, "P01"),3) = "P01"  then 1
    when substring(promo_name,instr(promo_name, "P2"),2)  = "P2"   then 2 
    When substring(promo_name,instr(promo_name, "P02"),3) = "P02"  then 2  
    when substring(promo_name,instr(promo_name, "P3"),2)  = "P3"   then 3 
    when substring(promo_name,instr(promo_name, "P03"),3) = "P03"  then 3
    when substring(promo_name,instr(promo_name, "P4"),2)  = "P4"   then 4  
    when substring(promo_name,instr(promo_name, "P04"),3) = "P04"  then 4 
    when substring(promo_name,instr(promo_name, "P5"),2)  = "P5"   then 5
    when substring(promo_name,instr(promo_name, "P05"),3) = "P05"  then 5
    when substring(promo_name,instr(promo_name, "P6"),2)  = "P6"   then 6
    when substring(promo_name,instr(promo_name, "P06"),3) = "P06"  then 6
    when substring(promo_name,instr(promo_name, "P7"),2)  = "P7"   then 7
    when substring(promo_name,instr(promo_name, "P07"),3) = "P07"  then 7 
    when trim(substring(promo_name,instr(promo_name, "P8"),2))  ="P8"  then 8 
    when trim(substring(promo_name,instr(promo_name, "P08"),3)) ="P08" then 8
    when trim(substring(promo_name,instr(promo_name, "P9"),2))  ="P9"  then 9
    when trim(substring(promo_name,instr(promo_name, "P09"),3)) ="P09" then 9
    when trim(substring(promo_name,instr(promo_name, "P10"),3)) ="P10" then 10 
    when trim(substring(promo_name,instr(promo_name, "P11"),3)) ="P11" then 11
    when trim(substring(promo_name,instr(promo_name, "P12"),3)) ="P12" then 12 

else 0 end as promo_id ,当 trim(substring(promo_name,instr(promo_name, "P10"),3)) = "P10" 然后 10 当 trim(substring(promo_name,instr(promo_name, "P11"),3) ) = "P11" 然后 11 当 trim(substring(promo_name,instr(promo_name, "P12"),3)) = "P12" 然后 12 当 trim(substring(promo_name,instr(promo_name, "P13"),3) ) = "P13" then 13 when trim(substring(promo_name,instr(promo_name, "P14"),3)) = "P14" then 14 else 0 end as id from hbi_dns_protected.store_zones_stock_v7_1_4 where promo_name is not null

尝试从字符串中提取 ID,当我在单独的列中使用时,它从 P10 到 P14 工作正常,当我在同一列中执行时,它只选择 1 而不是 11,选择 1 而不是 12 等等......

我在这里犯错了吗? 样本数据

4

2 回答 2

1

为什么不使用regexp_extract从您的字符串中提取正则表达式,而不是为每种情况编写代码,例如:

%sql
SELECT *,
  regexp_extract( promo_name, ' P(\\d+)', 1 ) AS promoNumber
FROM tmp

我的结果:

我的结果

注意 正则表达式区分大小写。如果您需要捕获小写或大写 Ps,那么您可以使用字符类 ie[pP]来代替。

使用的 RegEx 模式的完整说明:

  1. 正则表达式以空格字符和大写 P 开头。这将匹配空格和大写 P。如果要使匹配不区分大小写,可以使用字符类,例如[pP]匹配任何字符(大小写-敏感)在括号中
  2. RegEx 的下一个组件是(\\d+). 这由\d用于匹配数字的 RegEx 模式组成,该+符号表示“匹配一个或多个”。括号将其组成一个组,即第 1 组。\d有一个额外的斜杠,这是 Spark SQL 实现所需的转义字符regexp_extract
  3. 的最后一个参数regexp_extract的值为 1,这意味着“从函数返回组 1”

我使用regex101.com来测试和练习 RegEx 表达式。

于 2020-07-21T23:03:06.800 回答
0

代码在第一次匹配时停止,因此“11”匹配“1”。

我建议重新排序并使用like

(case when promo_name like 'P14%' then 14
      when promo_name like 'P13%' then 13
      . . .
 end)

也许您应该用样本数据和期望的结果提出一个新问题。可能有更简单的方法。

于 2020-07-21T15:35:08.163 回答