sql - 如何在 T-SQL 中识别和编辑匹配模式的所有实例

Question

我需要在某些字段上运行一个函数来识别和编辑任何 5 位或更长的数字，确保除最后 4 位之外的所有数字都替换为 *

例如：“Some text with 12345 and 1234 and 12345678”会变成“Some text with *2345 and 1234 and ****5678”

我使用 PATINDEX 来识别模式的起始字符：

PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', TEST_TEXT)

我可以递归地调用它来获取所有事件的起始字符，但我正在为实际的编辑而苦苦挣扎。

有人对如何做到这一点有任何指示吗？我知道使用 REPLACE 将 *s 插入它们需要的位置，这只是识别我实际上应该替换的东西，我正在努力解决。

可以在程序上执行，但我需要它是 T-SQL（如果需要可以是函数）。

非常感谢任何提示！

score 2 · Accepted Answer

您可以使用 SQL Server 的内置函数来执行此操作。本示例中使用的所有这些都存在于 SQL Server 2008 及更高版本中。

DECLARE @String VARCHAR(500) = 'Example Input: 1234567890, 1234, 12345, 123456, 1234567, 123asd456'
DECLARE @StartPos INT = 1, @EndPos INT = 1;
DECLARE @Input VARCHAR(500) = ISNULL(@String, '') + ' '; --Sets input field and adds a control character at the end to make the loop easier.
DECLARE @OutputString VARCHAR(500) = ''; --Initalize an empty string to avoid string null errors

WHILE (@StartPOS <> 0)
BEGIN
    SET @StartPOS = PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', @Input);
    IF @StartPOS <> 0
    BEGIN
        SET @OutputString += SUBSTRING(@Input, 1, @StartPOS - 1); --Seperate all contents before the first occurance of our filter
        SET @Input = SUBSTRING(@Input, @StartPOS, 500); --Cut the entire string to the end. Last value must be greater than the original string length to simply cut it all.

        SET @EndPos = (PATINDEX('%[0-9][0-9][0-9][0-9][^0-9]%', @Input)); --First occurance of 4 numbers with a not number behind it.
        SET @Input = STUFF(@Input, 1, (@EndPos - 1), REPLICATE('*', (@EndPos - 1))); --@EndPos - 1 gives us the amount of chars we want to replace.
    END
END
SET @OutputString += @Input; --Append the last element

SET @OutputString = LEFT(@OutputString, LEN(@OutputString))
SELECT @OutputString;

输出以下内容：

示例输入：******7890、1234、*2345、**3456、***4567、123asd456

这整个代码也可以作为一个函数，因为它只需要一个输入文本。

score 1 · Accepted Answer

递归 CTE 的肮脏解决方案

DECLARE 
  @tags nvarchar(max) = N'Some text with 12345 and 1234 and 12345678',
  @c nchar(1) = N' ';
;
WITH Process (s, i)
as
(
SELECT @tags, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', @tags)
UNION ALL 
SELECT value,  PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', value)
FROM
(SELECT SUBSTRING(s,0,i)+'*'+SUBSTRING(s,i+4,len(s)) value
FROM Process
WHERE i >0) calc
  -- we surround the value and the string with leading/trailing ,
  -- so that cloth isn't a false positive for clothing
) 
SELECT * FROM Process
WHERE i=0

我认为更好的解决方案是在 Ms SQL Server 中添加 clr 函数来管理正则表达式。 sql-clr/正则表达式

score 1 · Accepted Answer

这是一个使用 DelimitedSplit8K_LEAD 的选项，可以在此处找到。https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2这是 Jeff Moden 的拆分器的扩展，甚至比原来的。与大多数其他拆分器相比，此拆分器的最大优势是它返回每个元素的序数位置。对此的一个警告是，我正在根据您的样本数据使用空间进行拆分。如果您在其他字符中间塞满了数字，这将忽略它们。根据您的具体要求，这可能是好是坏。

declare @Something varchar(100) = 'Some text with 12345 and 1234 and 12345678';

with MyCTE as
(
    select x.ItemNumber 
        , Result = isnull(case when TRY_CONVERT(bigint, x.Item) is not null then isnull(replicate('*', len(convert(varchar(20), TRY_CONVERT(bigint, x.Item))) - 4), '') + right(convert(varchar(20), TRY_CONVERT(bigint, x.Item)), 4) end, x.Item)
    from dbo.DelimitedSplit8K_LEAD(@Something, ' ') x
)
select Output = stuff((select ' ' + Result 
                        from MyCTE 
                        order by ItemNumber
                        FOR XML PATH('')), 1, 1, '')

这会产生： 一些带有 *2345 和 1234 和 ****5678 的文本

sql - 如何在 T-SQL 中识别和编辑匹配模式的所有实例

3 回答 3

Related

Reference