regex - REGEXP_EXTRACT String 以 AG 或 TS 开头，然后捕获所有内容

Question

下面是数据集示例，每一行都有以下值：

排	价值
1	AG3608-sueyfbnd-sjwfk
2	TS2649-sjwjmdaqo-wkdmfl
3	乌节索普
4	sjhwu78iwjm

很快....

我想提取以 AG 或 TS 开头的值，然后捕获所有内容。下面是想要的结果

排	价值
1	AG3608-sueyfbnd-sjwfk
2	TS2649-sjwjmdaqo-wkdmfl

我写了这样的东西，但它只捕获前 2 个字母 AG 或 TS regexp_extract(${column},'^(AG|TS).*')它并没有捕获之后的所有内容

score 1 · Accepted Answer

同时使用捕获组和非捕获组：

regexp_extract(${column},'^((?:AG|TS).*)')

解释

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      AG                       'AG'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      TS                       'TS'
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1

score 1 · Accepted Answer

考虑下面

select *, 
  regexp_extract(value, r'^(?:AG|TS)(.*)') as everything_after
from data
where regexp_contains(value,'^(AG|TS)')

如果应用于您问题中的样本数据 - 输出是

regex - REGEXP_EXTRACT String 以 AG 或 TS 开头，然后捕获所有内容

2 回答 2

Related

Reference