2

下面是数据集示例,每一行都有以下值:

价值
1 AG3608-sueyfbnd-sjwfk
2 TS2649-sjwjmdaqo-wkdmfl
3 乌节索普
4 sjhwu78iwjm

很快....

我想提取以 AG 或 TS 开头的值,然后捕获所有内容。下面是想要的结果

价值
1 AG3608-sueyfbnd-sjwfk
2 TS2649-sjwjmdaqo-wkdmfl

我写了这样的东西,但它只捕获前 2 个字母 AG 或 TS regexp_extract(${column},'^(AG|TS).*')它并没有捕获之后的所有内容

4

2 回答 2

1

同时使用捕获组和非捕获组:

regexp_extract(${column},'^((?:AG|TS).*)')

解释

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      AG                       'AG'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      TS                       'TS'
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
于 2021-06-11T20:08:38.930 回答
1

考虑下面

select *, 
  regexp_extract(value, r'^(?:AG|TS)(.*)') as everything_after
from data
where regexp_contains(value,'^(AG|TS)')    

如果应用于您问题中的样本数据 - 输出是

在此处输入图像描述

于 2021-06-11T22:50:09.630 回答