0

我有一个包含客户 ID 和产品 ID 组合的表。我想确定产量最高的产品持有组合。即产量最大的产品组合和产品子组合或单个产品是什么。

例如,如果我有 3 种产品的组合,我想根据这些产品的其他组合或单独出现对其进行排名。尝试这样做,以便我可以识别出产量最高的组合。

请分享您对此的看法。数据库 - SQL 服务器。

CREATE TABLE [dbo].[spk_bkup_cust_prod](
    [cust_id] [varchar](100) NULL,
    [prod_id] [varchar](100) NULL
) ON [PRIMARY]

GO
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust1', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust1', N'prod2')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust2', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust3', N'prod2')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust3', N'prod3')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust4', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust5', N'prod1')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust5', N'prod2')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust5', N'prod3')
INSERT [dbo].[spk_bkup_cust_prod] ([cust_id], [prod_id]) VALUES (N'cust6', N'prod1')

prodset             prodrnk_max
prod1               3
prod1-prod2         4
prod1-prod2-prod3   6
prod2-prod3         1

在上面的例子中,如果我研究 prod1 和 prod2,我会找到 4 个客户。我会打到一个有 prod1-prod2 的客户和 3 个只有 prod1 的客户。

如果我研究 prod1、prod2 和 prod3,那么我会打到所有客户。

尝试使用 string_split 和 XML PATH 的组合。它有助于整合单一产品子控股,但不能整合多产品子控股。


with custprod as (
    SELECT
        distinct cust_id , prodset = STUFF((
              SELECT '-' + cp_grp.prod_id
              FROM dbo.spk_bkup_cust_prod cp_grp
              WHERE cp.cust_id = cp_grp.cust_id
              FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    FROM dbo.spk_bkup_cust_prod cp
), prodcnt as (
    select prodset, count(*) rcnt from custprod group by prodset
), proddis as (
    select 
        prodcnt.prodset
        , prodcnt.rcnt
        , value indivprod
        --, convert(varchar,value)
    from 
    prodcnt
    cross apply string_split(prodset,'-')
), prodrnk as (
    select proddis.*
    , case when proddis.indivprod != proddis.prodset and prodcnt.prodset is not null then  proddis.rcnt + prodcnt.rcnt  else proddis.rcnt end prodrnk 
    from proddis 
    left join prodcnt on prodcnt.prodset = proddis.indivprod
)
select
    prodset, max(prodrnk) prodrnk_max
from 
prodrnk
group by prodset  

我认为这是一种错误的方法,所以最初没有附加。

4

2 回答 2

1

如果我理解正确,它会是这样的。

with custprod as (
    SELECT
        distinct cust_id , prodset = STUFF((
              SELECT '-' + cp_grp.prod_id
              FROM dbo.spk_bkup_cust_prod cp_grp
              WHERE cp.cust_id = cp_grp.cust_id
              FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    FROM dbo.spk_bkup_cust_prod cp
), prodcnt as (
    select
        cust_id,
        prod_id,
        count(*) over(partition by cust_id) as cnt -- number of products each customer has
    from dbo.spk_bkup_cust_prod
), custcnt as (
    select distinct
        p1.cust_id,
        count(*) over(partition by p1.cust_id) as cnt -- number of customers contains the same projects
    from prodcnt p1
    inner join prodcnt p2 on p1.prod_id = p2.prod_id
    group by p1.cust_id, p2.cust_id
    having max(p2.cnt) = count(*)
)

select
    prodset,
    max(c.cnt) as prodrnk_max
from custprod p 
inner join custcnt c on c.cust_id = p.cust_id
group by prodset
于 2020-08-05T08:13:11.347 回答
0

这似乎对我有用,至少对于小样本集。


drop table if exists dbo.#CustProdTag
SELECT distinct top 100 percent
    cust_id,
    prodset =   STUFF(
                    (
                        SELECT '%' + cp_grp.prod_id
                        FROM dbo.spk_bkup_cust_prod cp_grp
                        WHERE cp.cust_id = cp_grp.cust_id
                        FOR XML PATH(''), TYPE
                    --  order by cp_grp.prod_id
                        ).value('.', 'NVARCHAR(MAX)'), 1, 1, ''
                        
                )
INTO    dbo.#CustProdTag
FROM    dbo.spk_bkup_cust_prod cp
ORDER BY prodset

------------------------------------------------------------------------

drop table if exists dbo.#ProdCombo
select distinct prodset
into dbo.#ProdCombo
from dbo..#CustProdTag

------------------------------------------------------------------------

SELECT * FROM dbo.#ProdCombo
SELECT * FROM dbo.#CustProdTag  ORDER BY prodset

------------------------------------------------------------------------

SELECT
    pc.prodset,
    COUNT(*)        [RecCnt]
FROM
    dbo.#ProdCombo                  pc
    LEFT JOIN   dbo.#CustProdTag    cpt     ON  pc.prodset like '%' + cpt.prodset + '%'
GROUP BY
    pc.prodset
ORDER BY
    pc.prodset


于 2020-08-05T09:55:25.617 回答