3

我有一个包含产品信息的数据库,特别是按材料分类的产品包装重量。并非每种产品都有实际包装重量,因此有一个系统可以通过将这些产品组合在一起来确定这些产品的平均重量。

例如,如果有一种新产品“Can of beans”,则可能会将其放入名为“Cans”的组中。“罐头”组中的其他产品将具有包装重量,因此需要计算来确定该组的平均重量(按材料)。

在呈现权重数据时,如果可用,我想使用实际权重,如果不可用,则回退到使用组权重。问题是产品与实际重量/组重量之间的关系是一对多的,因此如果产品同时具有实际重量和组重量,则可能会返回多行重复数据。

在实时系统中,大约有 1000 万个产品和超过 300 万个权重,因此我需要一个性能良好的解决方案。

我目前的方法是只选择所有行,然后取权重的 AVG,但这似乎是一个相当“笨拙”的解决方案。有一个更好的方法吗?

我有一个使用虚构数据的(相当长的)示例:

DECLARE @Product TABLE (
    ProductId INT,
    GroupId INT,
    ProductName VARCHAR(50),
    PRIMARY KEY (ProductId));
DECLARE @Group TABLE (
    GroupId INT,
    GroupName VARCHAR(50),
    PRIMARY KEY (GroupId));
DECLARE @Material TABLE (
    MaterialId INT,
    MaterialName VARCHAR(50),
    PRIMARY KEY (MaterialId));
DECLARE @ProductWeight TABLE (
    ProductId INT,
    MaterialId INT,
    [Weight] NUMERIC(19,2),
    PRIMARY KEY (ProductId, MaterialId));
DECLARE @GroupWeight TABLE (
    GroupId INT,
    MaterialId INT,
    [Weight] NUMERIC(19,2),
    PRIMARY KEY (GroupId, MaterialId));

--Materials, only three for this example
INSERT INTO @Material VALUES (1, 'Paper');
INSERT INTO @Material VALUES (2, 'Steel');
INSERT INTO @Material VALUES (3, 'Glass');

--Two groups, one for cans and one for bottles
INSERT INTO @Group VALUES (1, 'Cans');
INSERT INTO @Group VALUES (2, 'Bottles');

--Five products, two "cans" and three "bottles"
INSERT INTO @Product VALUES (1, 1, 'Can of soup');
INSERT INTO @Product VALUES (2, 1, 'Can of beans');
INSERT INTO @Product VALUES (3, 2, 'Bottle of beer');
INSERT INTO @Product VALUES (4, 2, 'Bottle of wine');
INSERT INTO @Product VALUES (5, 2, 'Bottle of sauce');

--Three products have actual weights
INSERT INTO @ProductWeight VALUES (1, 1, 5.2);
INSERT INTO @ProductWeight VALUES (1, 2, 23.1);
INSERT INTO @ProductWeight VALUES (3, 1, 4.6);
INSERT INTO @ProductWeight VALUES (3, 2, 2.4);
INSERT INTO @ProductWeight VALUES (3, 3, 185.9);
INSERT INTO @ProductWeight VALUES (4, 1, 5.1);
INSERT INTO @ProductWeight VALUES (4, 2, 2.6);
INSERT INTO @ProductWeight VALUES (4, 3, 650.4);

--Calculate the group weights
INSERT INTO @GroupWeight 
SELECT p.GroupId, pw.MaterialId, AVG(pw.[Weight]) 
FROM @ProductWeight pw INNER JOIN @Product p ON p.ProductId = pw.ProductId
GROUP BY p.GroupId, pw.MaterialId;

--Now display the product information, use the actual weights where available and the group weights otherwise
SELECT
    p.ProductName,
    m.MaterialName,
    CASE WHEN pw.[Weight] IS NOT NULL THEN 'Product' ELSE 'Group' END AS WeightSource,
    AVG(COALESCE(pw.[Weight], gw.[Weight])) AS [Weight]
FROM
    @Product p
    LEFT JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
    LEFT JOIN @GroupWeight gw ON gw.GroupId = p.GroupId
    LEFT JOIN @Material m ON m.MaterialId = COALESCE(pw.MaterialId, gw.MaterialId)
GROUP BY
    p.ProductName,
    m.MaterialName,
    CASE WHEN pw.[Weight] IS NOT NULL THEN 'Product' ELSE 'Group' END;

当它运行时,它以我想要的格式返回数据,包括重量源,即它是实际重量还是组重量:

ProductName     MaterialName    WeightSource    Weight
Bottle of beer  Glass           Product         185.900000
Bottle of beer  Paper           Product         4.600000
Bottle of beer  Steel           Product         2.400000
Bottle of sauce Glass           Group           418.150000
Bottle of sauce Paper           Group           4.850000
Bottle of sauce Steel           Group           2.500000
Bottle of wine  Glass           Product         650.400000
Bottle of wine  Paper           Product         5.100000
Bottle of wine  Steel           Product         2.600000
Can of beans    Paper           Group           5.200000
Can of beans    Steel           Group           23.100000
Can of soup     Paper           Product         5.200000
Can of soup     Steel           Product         23.100000

但我不禁觉得必须有一种更有效的方法来做到这一点?

编辑 - 我尝试使用 UNION ALL,也许我遗漏了一些东西,因为这是我能想到的最好的?

WITH RawData AS (
SELECT
    p.ProductName,
    m.MaterialName,
    'Product' AS WeightSource,
    pw.[Weight]
FROM
    @Product p
    INNER JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
    INNER JOIN @Material m ON m.MaterialId = pw.MaterialId
UNION ALL
SELECT
    p.ProductName,
    m.MaterialName,
    'Group' AS WeightSource,
    gw.[Weight]
FROM
    @Product p
    INNER JOIN @GroupWeight gw ON gw.GroupId = p.GroupId
    INNER JOIN @Material m ON m.MaterialId = gw.MaterialId),
RankedWeightSource AS (
SELECT
    ProductName,
    WeightSource,
    ROW_NUMBER() OVER (PARTITION BY ProductName ORDER BY WeightSource DESC) AS RowRank
FROM
    RawData
GROUP BY 
    ProductName,
    WeightSource),
BestWeightSource AS (
SELECT
    ProductName,
    WeightSource
FROM
    RankedWeightSource
WHERE
    RowRank = 1)
SELECT 
    * 
FROM 
    RawData rd
    INNER JOIN BestWeightSource bws ON bws.ProductName = rd.ProductName AND bws.WeightSource = rd.WeightSource;
4

1 回答 1

1

我之前在类似情况下所做的是引入一个包含所有可能值的原始查询,以及值的优先级;然后使用ROW_NUMBER外部查询来获取具有最高优先级的值。

我将使用您的(优秀的)示例数据,一切都在插入@GroupWeight.

这是我们的原始数据:

-- the product weights (use INNER JOIN to only find 
--   the products with their own weights)
SELECT
    p.ProductId,
    p.ProductName,
    m.MaterialId,
    m.MaterialName,
    pw.Weight,
    'Product' WeightSource,
    20 Precedence
FROM
    @Product p
    INNER JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
    INNER JOIN @Material m ON m.MaterialId = pw.MaterialId
UNION ALL
-- the group weight
SELECT
    p.ProductId,
    p.ProductName,
    m.MaterialId,
    m.MaterialName,
    gw.Weight,
    'Group' WeightSource,
    10 Precedence
FROM
    @Product p
    INNER JOIN @GroupWeight gw on gw.GroupId = p.GroupId
    INNER JOIN @Material m ON m.MaterialId = gw.MaterialId

这将为具有特定重量的每种产品材料返回一行,并为每种产品材料返回一行。每行表示它是产品重量还是组重量。

然后我们可以对行进行编号,按优先级排序:

-- assume the above is in a CTE named AllWeights
SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY ProductId, MaterialId 
                       ORDER BY Precedence DESC) rn
FROM 
    AllWeights

这为我们提供了相同的数据,并额外指示了给定产品材料的哪一行是相关的,所以最后我们可以得到:

-- assume the above is in a CTE named RowNumbered
SELECT
    ProductName,
    MaterialName,
    WeightSource,
    Weight
FROM
    RowNumbered
WHERE
    rn = 1
;

我们完成了。


把它们放在一起:

;WITH AllWeights AS (
-- the product weights (use INNER JOIN to only find 
--   the products with their own weights)
SELECT
    p.ProductId,
    p.ProductName,
    m.MaterialId,
    m.MaterialName,
    pw.Weight,
    'Product' WeightSource,
    20 Precedence
FROM
    @Product p
    INNER JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
    INNER JOIN @Material m ON m.MaterialId = pw.MaterialId
UNION ALL
-- the group weight
SELECT
    p.ProductId,
    p.ProductName,
    m.MaterialId,
    m.MaterialName,
    gw.Weight,
    'Group' WeightSource,
    10 Precedence
FROM
    @Product p
    INNER JOIN @GroupWeight gw on gw.GroupId = p.GroupId
    INNER JOIN @Material m ON m.MaterialId = gw.MaterialId
),
RowNumbered AS (
SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY ProductId, MaterialId 
                       ORDER BY Precedence DESC) rn
FROM 
    AllWeights
)
SELECT
    ProductName,
    MaterialName,
    WeightSource,
    Weight
FROM
    RowNumbered
WHERE
    rn = 1
;

输出:

ProductName          MaterialName WeightSource Weight
-------------------- ------------ ------------ ------------
Can of soup          Paper        Product      5.20
Can of soup          Steel        Product      23.10
Can of beans         Paper        Group        5.20
Can of beans         Steel        Group        23.10
Bottle of beer       Paper        Product      4.60
Bottle of beer       Steel        Product      2.40
Bottle of beer       Glass        Product      185.90
Bottle of wine       Paper        Product      5.10
Bottle of wine       Steel        Product      2.60
Bottle of wine       Glass        Product      650.40
Bottle of sauce      Paper        Group        4.85
Bottle of sauce      Steel        Group        2.50
Bottle of sauce      Glass        Group        418.15

我想,除了顺序和你的一样。

当然,您必须自己检查性能。

于 2013-10-09T12:59:45.603 回答