0

我需要{}使用 T-SQL 解析下一段中包含的字符串,然后显示它们。

这是一个带有 {Term1} 的测试句。有时,{Term2} 可能是一个单词或短语,例如 {Phrase Term3}。{Term2} 重复。某些术语可能是另一个术语的复数形式,例如 {Term2}s。这是一个真正的 {Simple} 术语。

期望的结果:

Term1
Term2
Phrase Term3
Term2
Term2
Simple
4

2 回答 2

3

您可以通过将 all 替换{为开始元素并将 all}替换为结束元素来将字符串转换为 XML,然后在 XML 中查询标记。

declare @S nvarchar(max)
set @S = N'Here is a test sentence with a {Term1}. Sometime, a {Term2} could be a word or phrase like {Phrase Term3}. {Term2} is repeated. Some Terms could be a plural form of a another Term like {Term2}s. Here is a real {Simple} Term.'

select T.N.value('text()[1]', 'nvarchar(max)') as Token
from (select cast(replace(replace(@S, N'{', N'<token>'), N'}', N'</token>') as xml)) as S(X)
  cross apply S.X.nodes('token') as T(N)

SQL小提琴

于 2013-10-12T08:48:23.790 回答
3

您可以使用多语句表值函数来做到这一点,但我真的认为这种类型的解析最好留给更强大的语言。这将处理最多约 8,000 个字符的令牌{up to 255 characters}和输入字符串,具体取决于 SQL Server 的版本。如果您需要更多,请sys.all_columns您自己的数字表替换。请注意,我没有采取任何措施来防止无效的令牌序列......

CREATE FUNCTION dbo.ParseTokens
(
    @string NVARCHAR(MAX),
    @token1 NVARCHAR(255),
    @token2 NVARCHAR(255)
)
RETURNS @t TABLE([Index] INT IDENTITY(1,1), Item NVARCHAR(255))
AS
BEGIN
    INSERT @t(Item) 
    SELECT SUBSTRING(x, 1, COALESCE(NULLIF(CHARINDEX(@token2, x)-1,-1),255)) 
    FROM 
    (
      SELECT Number, x = SUBSTRING(@string, Number, 
        CHARINDEX(@token1, @string + @token1, Number) - Number)
      FROM
      (
        SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
          FROM sys.all_columns
      ) AS n(Number) WHERE Number <= CONVERT(INT, LEN(@string))
        AND SUBSTRING(@token1 + @string, Number, LEN(@token1)) = @token1
    ) AS y
    ORDER BY Number OPTION (MAXDOP 1);

    DELETE @t WHERE [Index] = 1;

    RETURN;
END
GO

示例用法 - 在独立字符串上:

DECLARE @x NVARCHAR(MAX);

SET @x = N'foo{bar} and think {splunge}';

SELECT Item FROM dbo.ParseTokens(@x, '{', '}') ORDER BY [Index];

结果:

Item
-------
bar
splunge

示例用法 - 针对表格:

DECLARE @x TABLE(ID INT IDENTITY(1,1), n NVARCHAR(MAX));

INSERT @x SELECT N'Here is a test sentence with a {Term1}. Sometime, a {Term2}
  could be a word or phrase like {Phrase Term3}. {Term2} is repeated. Some Terms
  could be a plural form of a another Term like {Term2}s. Here is a real
  {Simple} Term.';

INSERT @x SELECT N'Hello {foo} there {bar} ...';

SELECT t.ID, p.Item
 FROM @x AS t
 CROSS APPLY dbo.ParseTokens(t.n, '{', '}') AS p;

结果:

ID     Item
----   ------------
1      Term1
1      Term2
1      Phrase Term3
1      Term2
1      Term2
1      Simple
2      foo
2      bar
于 2013-10-12T00:02:35.537 回答