我有一个包含产品信息的数据库,特别是按材料分类的产品包装重量。并非每种产品都有实际包装重量,因此有一个系统可以通过将这些产品组合在一起来确定这些产品的平均重量。
例如,如果有一种新产品“Can of beans”,则可能会将其放入名为“Cans”的组中。“罐头”组中的其他产品将具有包装重量,因此需要计算来确定该组的平均重量(按材料)。
在呈现权重数据时,如果可用,我想使用实际权重,如果不可用,则回退到使用组权重。问题是产品与实际重量/组重量之间的关系是一对多的,因此如果产品同时具有实际重量和组重量,则可能会返回多行重复数据。
在实时系统中,大约有 1000 万个产品和超过 300 万个权重,因此我需要一个性能良好的解决方案。
我目前的方法是只选择所有行,然后取权重的 AVG,但这似乎是一个相当“笨拙”的解决方案。有一个更好的方法吗?
我有一个使用虚构数据的(相当长的)示例:
DECLARE @Product TABLE (
ProductId INT,
GroupId INT,
ProductName VARCHAR(50),
PRIMARY KEY (ProductId));
DECLARE @Group TABLE (
GroupId INT,
GroupName VARCHAR(50),
PRIMARY KEY (GroupId));
DECLARE @Material TABLE (
MaterialId INT,
MaterialName VARCHAR(50),
PRIMARY KEY (MaterialId));
DECLARE @ProductWeight TABLE (
ProductId INT,
MaterialId INT,
[Weight] NUMERIC(19,2),
PRIMARY KEY (ProductId, MaterialId));
DECLARE @GroupWeight TABLE (
GroupId INT,
MaterialId INT,
[Weight] NUMERIC(19,2),
PRIMARY KEY (GroupId, MaterialId));
--Materials, only three for this example
INSERT INTO @Material VALUES (1, 'Paper');
INSERT INTO @Material VALUES (2, 'Steel');
INSERT INTO @Material VALUES (3, 'Glass');
--Two groups, one for cans and one for bottles
INSERT INTO @Group VALUES (1, 'Cans');
INSERT INTO @Group VALUES (2, 'Bottles');
--Five products, two "cans" and three "bottles"
INSERT INTO @Product VALUES (1, 1, 'Can of soup');
INSERT INTO @Product VALUES (2, 1, 'Can of beans');
INSERT INTO @Product VALUES (3, 2, 'Bottle of beer');
INSERT INTO @Product VALUES (4, 2, 'Bottle of wine');
INSERT INTO @Product VALUES (5, 2, 'Bottle of sauce');
--Three products have actual weights
INSERT INTO @ProductWeight VALUES (1, 1, 5.2);
INSERT INTO @ProductWeight VALUES (1, 2, 23.1);
INSERT INTO @ProductWeight VALUES (3, 1, 4.6);
INSERT INTO @ProductWeight VALUES (3, 2, 2.4);
INSERT INTO @ProductWeight VALUES (3, 3, 185.9);
INSERT INTO @ProductWeight VALUES (4, 1, 5.1);
INSERT INTO @ProductWeight VALUES (4, 2, 2.6);
INSERT INTO @ProductWeight VALUES (4, 3, 650.4);
--Calculate the group weights
INSERT INTO @GroupWeight
SELECT p.GroupId, pw.MaterialId, AVG(pw.[Weight])
FROM @ProductWeight pw INNER JOIN @Product p ON p.ProductId = pw.ProductId
GROUP BY p.GroupId, pw.MaterialId;
--Now display the product information, use the actual weights where available and the group weights otherwise
SELECT
p.ProductName,
m.MaterialName,
CASE WHEN pw.[Weight] IS NOT NULL THEN 'Product' ELSE 'Group' END AS WeightSource,
AVG(COALESCE(pw.[Weight], gw.[Weight])) AS [Weight]
FROM
@Product p
LEFT JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
LEFT JOIN @GroupWeight gw ON gw.GroupId = p.GroupId
LEFT JOIN @Material m ON m.MaterialId = COALESCE(pw.MaterialId, gw.MaterialId)
GROUP BY
p.ProductName,
m.MaterialName,
CASE WHEN pw.[Weight] IS NOT NULL THEN 'Product' ELSE 'Group' END;
当它运行时,它以我想要的格式返回数据,包括重量源,即它是实际重量还是组重量:
ProductName MaterialName WeightSource Weight
Bottle of beer Glass Product 185.900000
Bottle of beer Paper Product 4.600000
Bottle of beer Steel Product 2.400000
Bottle of sauce Glass Group 418.150000
Bottle of sauce Paper Group 4.850000
Bottle of sauce Steel Group 2.500000
Bottle of wine Glass Product 650.400000
Bottle of wine Paper Product 5.100000
Bottle of wine Steel Product 2.600000
Can of beans Paper Group 5.200000
Can of beans Steel Group 23.100000
Can of soup Paper Product 5.200000
Can of soup Steel Product 23.100000
但我不禁觉得必须有一种更有效的方法来做到这一点?
编辑 - 我尝试使用 UNION ALL,也许我遗漏了一些东西,因为这是我能想到的最好的?
WITH RawData AS (
SELECT
p.ProductName,
m.MaterialName,
'Product' AS WeightSource,
pw.[Weight]
FROM
@Product p
INNER JOIN @ProductWeight pw ON pw.ProductId = p.ProductId
INNER JOIN @Material m ON m.MaterialId = pw.MaterialId
UNION ALL
SELECT
p.ProductName,
m.MaterialName,
'Group' AS WeightSource,
gw.[Weight]
FROM
@Product p
INNER JOIN @GroupWeight gw ON gw.GroupId = p.GroupId
INNER JOIN @Material m ON m.MaterialId = gw.MaterialId),
RankedWeightSource AS (
SELECT
ProductName,
WeightSource,
ROW_NUMBER() OVER (PARTITION BY ProductName ORDER BY WeightSource DESC) AS RowRank
FROM
RawData
GROUP BY
ProductName,
WeightSource),
BestWeightSource AS (
SELECT
ProductName,
WeightSource
FROM
RankedWeightSource
WHERE
RowRank = 1)
SELECT
*
FROM
RawData rd
INNER JOIN BestWeightSource bws ON bws.ProductName = rd.ProductName AND bws.WeightSource = rd.WeightSource;