sql - SQL Server 2008 R2 获取一组文本字符串之间的公共文本

Question

我当然希望有人可以帮助我解决这个问题。我一直在寻找几个小时才能找到它，但我空空如也。

在这个例子中，我的表中有两列

GRP_ID    Desc

我的组 ID 是我将识别这些产品属于同一类型的方式，而 desc 是我要查找所有常用词的方式。

所以这是我的桌子

GRP_ID          Desc
-------------------------------   
2               Red Hat
2               Green Hat
2               Yellow Hat
3               Boots Large Brown
3               Boots Medium Red
3               Boots Medium Brown

作为查询的结果，我想要的是以下内容

GRP_ID           Desc
-----------------------    
2                Hat
3                Boots

所以我想要的是组中每个字符串中出现的所有单词或组中的常见单词。

score 0 · Accepted Answer

I think you'd need to create a mapping table for GRP_ID and products - e.g. Hat and Boots.

CREATE TABLE GroupProductMapping (
    GRP_ID INT NOT NULL, -- I'm assuming its an Int
    ProductDesc VARCHAR(50) NOT NULL
)

SELECT a.GRP_ID,
    b.ProductDesc Desc
FROM {Table_Name} a
INNER JOIN GroupProductMapping b ON a.GRP_ID = b.GRP_ID

Alternatively, if you don't have too many products. You could use CASE in your SELECT clause. e.g.

SELECT 
    GRP_ID,
    CASE GRP_ID 
        WHEN 1 THEN 'Hat' 
        WHEN 2 THEN 'Boots'
    END AS Desc
FROM {Table_Name}

{Table_Name} is the name of your original table.

score 0 · Accepted Answer

理想情况下，您会将数据标准化并将单词存储在单独的表中。

但是，对于您的即时需求，您首先需要提供一个 UDF 将“desc”拆分为单词。我挖了这个功能：

-- this function splits the provided strings on a delimiter
-- similar to .Net string.Split.
-- I'm sure there are alternatives (such as calling string.Split through
-- a CLR function).  
CREATE FUNCTION [dbo].[Split]
(    
    @RowData NVARCHAR(MAX),
    @Delimeter NVARCHAR(MAX)
)
RETURNS @RtnValue TABLE 
(
    ID INT IDENTITY(1,1),
    Data NVARCHAR(MAX)
) 
AS
BEGIN 
    DECLARE @Iterator INT
    SET @Iterator = 1

    DECLARE @FoundIndex INT
    SET @FoundIndex = CHARINDEX(@Delimeter,@RowData)

    WHILE (@FoundIndex>0)
    BEGIN
        INSERT INTO @RtnValue (data)
        SELECT 
            Data = LTRIM(RTRIM(SUBSTRING(@RowData, 1, @FoundIndex - 1)))

        SET @RowData = SUBSTRING(@RowData,
                @FoundIndex + DATALENGTH(@Delimeter) / 2,
                LEN(@RowData))

        SET @Iterator = @Iterator + 1
        SET @FoundIndex = CHARINDEX(@Delimeter, @RowData)
    END

    INSERT INTO @RtnValue (Data)
    SELECT Data = LTRIM(RTRIM(@RowData))

    RETURN
END

然后您需要拆分描述并进行一些分组（如果数据已标准化，您也可以这样做）

-- get the count of each grp_id
with group_count as
(
    select grp_id, count(*) cnt from [Group]
    group by grp_id
),
-- get the count of each word in each grp_id
group_word_count as
(
    select count(*) cnt, grp_id, data from 
    (
        select * from [group] g
        cross apply dbo.Split(g.[Desc], ' ') 
    )
    t
    group by grp_id, data
)
-- return rows where number of grp_id = number of words in grp_id
select gwc.GRP_ID, gwc.Data [Desc] from group_word_count gwc 
inner join group_count gc on gwc.GRP_ID = gc.GRP_ID and gwc.cnt = gc.cnt

[Group] 是您的桌子。

sql - SQL Server 2008 R2 获取一组文本字符串之间的公共文本

2 回答 2

Related

Reference