1

我在一个旧的旧销售系统中有一堆产品描述数据,我们试图通过对文本描述字段中包含的型号进行最佳猜测来对其进行一些销售分析。

所以我的销售线看起来像这样:

LineitemID | Description
----
1 | Sony Headphones for a Sony DHJ232
2 | Sony DHJ232 in blue
3 | SANYO KI8767 with carry case

然后我有一个单独的表,其中包含所有潜在的产品范围。

ProductRange
----
Sony DHJ232
SANYO KI8767
Sony Headphones

我想编写一个查询,该查询将返回我所有的 LineItem,最好猜测它们与哪个 ProductRange 结合,然而,这很简单,只需一个简单的 JOIN 和 LIKE 语句;复杂性出现在 LineItem #1 中,我们提到了两个不同的产品范围,这将导致多个匹配,其中一个是不正确的。

在找到多个匹配项的这种情况下,我想假设字符串中的第一个匹配项是最正确的。即索尼耳机,而不是索尼 DHJ232。

任何人都可以就最好的方法提供一些建议吗?

4

3 回答 3

1

像这样的东西。您应该在描述字段中使用子字符串的位置对结果进行排序(使用CHARINDEX())并选择第一个(最低)。

SELECT LineitemId,Description,ProductRange

FROM
(
SELECT LineitemId,Description,PR.ProductRange as ProductRange,
       ROW_NUMBER() OVER (PARTITION BY LineitemId 
                          ORDER BY CHARINDEX(PR.ProductRange,Description)
                          ) AS RowN

FROM T
JOIN PR on (T.Description LIKE '%'+PR.ProductRange+'%')
) as T1
WHERE RN=1
于 2013-10-21T12:15:02.460 回答
0
;WITH MATCH_START AS
(
    SELECT LI.POS, LI.LINEITEMID, PRODUCT.PRODUCTRANGE, LI.DESCRIPTION 
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY LINEITEMID) POS, LINEITEMID, DESCRIPTION FROM LINEITEM) LI 
        JOIN PRODUCT ON LI.DESCRIPTION LIKE PRODUCT.PRODUCTRANGE+'%'
),
MATCH_CONTAINS AS 
(
    SELECT LI.POS, LI.LINEITEMID, PRODUCT.PRODUCTRANGE, LI.DESCRIPTION 
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY LINEITEMID) POS, LINEITEMID, DESCRIPTION FROM LINEITEM) LI 
        JOIN PRODUCT ON LI.DESCRIPTION LIKE '%'+PRODUCT.PRODUCTRANGE+'%'
),
MIN_START_POS AS (
    SELECT MIN(POS) AS MIN_POS, PRODUCTRANGE FROM MATCH_START
    GROUP BY PRODUCTRANGE
),
MIN_CONTAIN_POS AS (
    SELECT MIN(POS) AS MIN_POS, PRODUCTRANGE FROM MATCH_CONTAINS
    GROUP BY PRODUCTRANGE
)

SELECT MS.PRODUCTRANGE,MS.DESCRIPTION, MS.LINEITEMID FROM MATCH_START MS
JOIN MIN_START_POS MSP ON MS.POS = MSP.MIN_POS AND MSP.PRODUCTRANGE = MS.PRODUCTRANGE

UNION 

SELECT MC.PRODUCTRANGE, MC.DESCRIPTION, MC.LINEITEMID FROM MATCH_CONTAINS MC
JOIN MIN_CONTAIN_POS MCP ON MC.POS = MCP.MIN_POS AND MCP.PRODUCTRANGE = MC.PRODUCTRANGE
AND MC.PRODUCTRANGE NOT IN (SELECT PRODUCTRANGE FROM MATCH_START)

--首先匹配以单词开头的productRange,然后匹配containint。

例如使用此数据:SELECT * FROM LINEITEM

LineItemId  Description
----------- --------------------------------------
1           Sony Headphones for a Sony DHJ232
2           Sony DHJ232 in blue
3           SANYO KI8767 with carry case
4           SANYO KI8767 with carry case 2
5           Sony Headphones for a Sony DHJ232 B

从产品中选择 *

ProductRange
----------------------
SANYO KI8767
Sony DHJ232
Sony Headphones

结果是

PRODUCTRANGE      DESCRIPTION                          LINEITEMID
---------------   -------------------------------------  -----------
SANYO KI8767      SANYO KI8767 with carry case            3
Sony DHJ232       Sony  DHJ232 in blue                    2
Sony Headphones   Sony Headphones for a Sony DHJ232       1
于 2013-10-21T13:02:34.233 回答
0

就个人而言,我希望能够优先考虑选择哪个“范围”而不是其序数位置;所以我会实现类似的东西: -

create table dbo.Sales (
    LineitemID int identity (1,1) not null primary key,
    [Description] varchar(50)
)
insert into dbo.Sales ([Description]) values ('Sony Headphones for a Sony DHJ232')
insert into dbo.Sales ([Description]) values ('Sony DHJ232 in blue')
insert into dbo.Sales ([Description]) values ('SANYO KI8767 with carry case')
insert into dbo.Sales ([Description]) values ('Sony Headphones for a Sony PS3')

create table dbo.ProductRange (
    ProductRangeId int identity (1,1) not null primary key,
    RangeName varchar(50),
    Significance int
)
insert into dbo.ProductRange (RangeName, Significance) values ('Sony DHJ232', 1)
insert into dbo.ProductRange (RangeName, Significance) values ('SANYO KI8767', 1)
insert into dbo.ProductRange (RangeName, Significance) values ('Sony Headphones', 2)
go
CREATE FUNCTION [dbo].GetRange
(
    @description varchar(50)
)
RETURNS INT
AS
BEGIN

    declare @ProductRangeId int

    select top 1 @ProductRangeId=pr.ProductRangeId
    from dbo.ProductRange pr
    where @description like '%'+pr.RangeName+'%'
    order by pr.Significance

    RETURN @ProductRangeId
END
go
select s.*, dbo.GetRange(s.Description) as RangeId
from dbo.Sales s

这将允许 dbo.[ProductRange] 中的 [Significance] 列指定在多个值是“命中”时返回的值。

输出将是: -

LineitemID  Description                                        RangeId
----------- -------------------------------------------------- -----------
1           Sony Headphones for a Sony DHJ232                  1
2           Sony DHJ232 in blue                                1
3           SANYO KI8767 with carry case                       2
4           Sony Headphones for a Sony PS3                     3

可以很容易地加入到 dbo.[ProductRange]

于 2013-10-21T14:03:17.857 回答