我需要帮助来规划从多个表中查找重复项的最佳行动方案。
DECLARE @Table1 TABLE (ID_T1 int, Col1 varchar(10), C2 varchar(10), C3 varchar(10), C4 varchar(10), Col5 varchar(10), Col6 varchar(10))
DECLARE @Table2 TABLE (ID_T2 int, Col1 varchar(10), C2 varchar(10), C3 varchar(10), C4 varchar(10), Col5 varchar(10), Col6 varchar(10))
INSERT INTO @Table1 (ID_T1, Col1, C2, C3, C4, Col5, Col6)
SELECT 1, 'One', 'Test1', 'Line1', 'Record1', 'OTLR1', 'RLTO1'
UNION ALL
SELECT 2, 'Two', 'Test2', 'Line2', 'Record2', 'OTLR2', 'RLTO2'
UNION ALL
SELECT 3, 'Three', 'Test3', 'Line3', 'Record3', 'OTLR3', 'RLTO3'
UNION ALL
SELECT 4, 'Four', 'Test4', 'Line4', 'Record4', 'OTLR4', 'RLTO4'
UNION ALL
SELECT 5, 'Five', 'Test5', 'Line5', 'Record5', 'OTLR5', 'RLTO5'
UNION ALL
SELECT 6, 'Six', 'Test6', 'Line6', 'Record6', 'OTLR6', 'RLTO6'
UNION ALL
SELECT 7, 'Seven', 'Test6', 'Line6', 'Record6', 'OTLR7', 'RLTO7'
UNION ALL
SELECT 8, 'Eight', 'Test8', 'Line8', 'Record8', 'OTLR8', 'RLTO8'
INSERT INTO @Table2 (ID_T2, Col1, C2, C3, C4, Col5, Col6)
SELECT 10, 'Ten', 'Test1', 'Line1', 'Record1', 'OTLR10', 'RLTO10'
UNION ALL
SELECT 20, 'Twenty', 'Test2', 'Line2', 'Record2', 'OTLR20', 'RLTO20'
UNION ALL
SELECT 30, 'Thirty', 'Test3', 'Line3', 'Record3', 'OTLR30', 'RLTO30'
UNION ALL
SELECT 40, 'Forty', 'Test4', 'Line4', 'Record4', 'OTLR40', 'RLTO40'
UNION ALL
SELECT 50, 'Fifty', 'Test5', 'Line5', 'Record5', 'OTLR50', 'RLTO50'
UNION ALL
SELECT 80, 'Eighty', 'Test80', 'Line80', 'Record80', 'OTLR80', 'RLTO80'
UNION ALL
SELECT 90, 'Ninety', 'Test90', 'Line90', 'Record90', 'OTLR90', 'RLTO90'
SELECT * FROM @Table1
SELECT * FROM @Table2
现在,C2、C3 和 C4 在表 1 和表 2 中可以具有唯一值或重复值。
我正在尝试获得三个输出。输出 1 将仅包含表 1 中的记录,这些记录在表 2 中具有相同的 C2、C3 和 C4 列值,在 Duplicate_SameTable 中重复标记为 1/0
输出 2 将仅包含表 1 中的记录,这些记录在表 2 中具有相同的 C2、C3 和 C4 列值,在 Duplicate_PrimaryTable 中重复标记为 1/0
输出 3 将包含来自表 1 和表 2 2 的记录,它们具有相同的 C2、C3 和 C4 列的值,在 Duplicate_BothTables 中重复标记为 1/0。
我可以从以下查询中获得输出 1。
SELECT *, CASE
WHEN COUNT(*) OVER (PARTITION BY C2, C3, C4) > 1 THEN 1
ELSE 0
END AS Duplicate_SameTable
FROM @Table1
ORDER BY ID_T1 ASC
输出 2
SELECT B.ID, B.Col1, B.C2, B.C3, B.C4, B.Col5, B.Col6, CASE WHEN C.Duplicate_SameTable = 1 THEN 0 ELSE B.Duplicate_BothTables END AS Duplicate_PrimaryTable
FROM (
SELECT ID, Col1, C2, C3, C4, Col5, Col6, CASE
WHEN COUNT(*) OVER (PARTITION BY C2, C3, C4) > 1 THEN 1
ELSE 0
END AS Duplicate_BothTables FROM (
SELECT ID_T1 AS ID, Col1, C2, C3, C4, Col5, Col6 FROM @Table1
UNION
SELECT ID_T2 AS ID, Col1, C2, C3, C4, Col5, Col6 FROM @Table2) A
) B INNER JOIN (SELECT *, CASE
WHEN COUNT(*) OVER (PARTITION BY C2, C3, C4) > 1 THEN 1
ELSE 0
END AS Duplicate_SameTable
FROM @Table1) C ON B.ID = C.ID_T1
输出 3
SELECT B.ID, B.Col1, B.C2, B.C3, B.C4, B.Col5, B.Col6, CASE WHEN C.Duplicate_SameTable = 1 THEN 0 ELSE B.Duplicate_BothTables END AS Duplicate_PrimaryTable
FROM (
SELECT ID, Col1, C2, C3, C4, Col5, Col6, CASE
WHEN COUNT(*) OVER (PARTITION BY C2, C3, C4) > 1 THEN 1
ELSE 0
END AS Duplicate_BothTables FROM (
SELECT ID_T1 AS ID, Col1, C2, C3, C4, Col5, Col6 FROM @Table1
UNION
SELECT ID_T2 AS ID, Col1, C2, C3, C4, Col5, Col6 FROM @Table2) A
) B LEFT JOIN (SELECT *, CASE
WHEN COUNT(*) OVER (PARTITION BY C2, C3, C4) > 1 THEN 1
ELSE 0
END AS Duplicate_SameTable
FROM @Table1) C ON B.ID = C.ID_T1
ORDER BY B.ID
我想知道如何获得输出 2 和输出 3。
我能想到的一种方法是联合所有表 1 和表 2,然后在查询上方运行。或者有没有更好的方法来做到这一点,因为真正的表将有数百万条记录并且执行 UNION ALL 然后应用上述查询可能需要更长的时间。
谢谢
编辑:用我的尝试更新了这篇文章。看起来太乱了,不确定这是否是最好的行动表现方式。