regex - 如何使用 regexp_substr 从字符串中提取所有主题标签

Question

我需要一个正则表达式模式，它从表中的推文中提取所有标签。我的数据是

select regexp_substr('My twwet #HashTag1 and this is the #SecondHashtag    sample','#\S+')
from dual

它只带来#HashTag1 而不是#SecondHashtag

我需要像 #HashTag1 #SecondHashtag 这样的输出

谢谢

score 2 · Accepted Answer

您可以使用regexp_replace删除与您的模式不匹配的所有内容。

with t (col) as (
  select 'My twwet #HashTag1 and this is the #SecondHashtag    sample, #onemorehashtag'
  from dual
)
select 
  regexp_replace(col, '(#\S+\s?)|.', '\1')
from t;

生产；

#HashTag1 #SecondHashtag #onemorehashtag

regexp_substr将返回一场比赛。您可以做的是使用以下方法将您的字符串转换为表格connect by：

with t (col) as (
  select 'My twwet #HashTag1 and this is the #SecondHashtag    sample, #onemorehashtag'
  from dual
)
select 
  regexp_substr(col, '#\S+', 1, level)
from t
connect by regexp_substr(col, '#\S+', 1, level) is not null;

回报：

#HashTag1
#SecondHashtag
#onemorehashtag

编辑：

\S 匹配任何非空格字符。最好使用匹配 az、AZ、0-9 和 _ 的 \w。

正如@mathguy 和本网站所评论的那样：主题标签以字母开头，然后允许使用字母数字字符或下划线。

所以，模式#[[:alpha:]]\w*会更好。

with t (col) as (
  select 'My twwet #HashTag1, this is the #SecondHashtag. #onemorehashtag'
  from dual
)
select 
  regexp_substr(col, '#[[:alpha:]]\w*', 1, level)
from t
connect by regexp_substr(col, '#[[:alpha:]]\w*', 1, level) is not null;

产生：

#HashTag1
#SecondHashtag
#onemorehashtag

regex - 如何使用 regexp_substr 从字符串中提取所有主题标签

1 回答 1

编辑：

Related

Reference