sql - 使用 Postgres 正则表达式搜索返回匹配的子字符串

Question

我正在尝试从 Postgres 的 varchar 字段中提取一些值。*product_name* 字段可能包含“Big Bag 24-0-3 20 Gallons”或“Small Bag 0-14-40”之类的内容，产品代码为#-#-#。产品代码中的数字可以是 0，也可以是一位或两位数字，但总会有三个数字，由两个破折号隔开。

我已经正确返回了匹配的产品，但是现在我需要将每个数字放在一个单独的字段中，所以我真的很感谢一个比我大脑更大的人来帮助处理子字符串返回！

此正则表达式匹配返回正确的产品：

select * from products where product_name LIKE '%_-_-_%'

我尝试使用子字符串返回产品代码，但它正在切断具有两位数的第三个数字的产品（即“Big Bag 24-0-32 Foo”将返回“24-0-3”）

select trim(substring(name from '%#"__-_-_#"%' for '#')),* 
from products where name LIKE '%_-_-_%'

实际上，无论如何，整数对我没有多大好处——我真的需要将三个数字中的每一个都提取到一个单独的子字符串中。

score 7 · Accepted Answer

一种选择是用于regexp_matches提取代码：

regexp_matches(string text, pattern text [, flags text])
返回通过将 POSIX 正则表达式与字符串匹配而产生的所有捕获的子字符串。

然后regexp_split_to_array：

regexp_split_to_array(string text, pattern text [, flags text ])
使用 POSIX 正则表达式作为分隔符拆分字符串。

将代码分解成数字。例如：

=> select regexp_split_to_array((regexp_matches('Big Bag 24-0-3 Twenty Gallons', E'(\\d+-\\d+-\\d+)'))[1], '-');
 regexp_split_to_array 
-----------------------
 {24,0,3}
(1 row)

这{24,0,3}是一个三元素数组，包含您感兴趣的三个数字（作为字符串）。还有regexp_split_to_table一个三行表是否比数组更容易使用：

=> select regexp_split_to_table((regexp_matches('Big Bag 24-0-3 Twenty Gallons', E'(\\d+-\\d+-\\d+)'))[1], '-');
 regexp_split_to_table 
-----------------------
 24
 0
 3
(3 rows)

score 1 · Accepted Answer

这不像您正在寻找的那样正则表达式，但也许它会让您更接近：

Select substring( arr[ 1 ] from '[0-9][0-9]*' ) as first,
    arr[ 2 ] as second,
    substring( arr[ 3 ] from '[0-9][0-9]*' ) as third
FROM
(
Select string_to_array( d1, '-' ) as arr
from
(
SELECT * FROM ( VALUES
( 1, 'Big Bag 24-0-3 Twenty Gallons' ),
( 2, 'Small Bag 0-14-40' ),
( 3, 'Big Bag 24-0-32 Foo' ),
( 4, 'Other Bag 4-4-24' )
) AS products( id, d1 )
) AS values_table
) AS get_array

可能有一种更好的方法可以一次性完成此操作，并且没有所有块 AS 别名，但这是细分：

VALUES 表提供测试数据 - d1 是要获取的数据。
这-在 string_to_array() 中被解析以获取和数组，如Big Bag 24,0和3 Twenty Gallons（它们是自动输入的）
外部选择仅通过从第一个和最后一个数组元素中挑选出数字来转换数组值。

这种工作可以放入一个函数来为你获取每个数字，但应该得到 NULL 测试等。

sql - 使用 Postgres 正则表达式搜索返回匹配的子字符串

2 回答 2

Related

Reference