sql - sql server 即使在 1 小时后也无法执行重复查询..继续加载它

Question

我有一个如下查询来查找我的表中的重复项，该表包含超过 10,00,000 个数据和 10 个字段。当我尝试执行查询时，它会继续加载和加载超过一个小时，但无法完成执行。当我使用只有 100 条记录的类似表尝试相同的查询时，它工作正常。

（所有列数据类型都是 nchar）

我想知道如何将其用于超过 10,00,000 的数据。

select * from table1 as L
where (select count(*) from table1
where L.date + L.time + L.color + L.supplier = table1.date +
table1.time + table1.color + table.supplier and table1.variety = 'dark' 
and date between '01062012' and '30062012') > 1

score 1 · Accepted Answer

不要使用L.date + L.time + L.color + L.supplier=table1.date + table1.time + table1.color + table.supplier 这样做会破坏在连接中使用索引的任何能力。

select * from table1 as L
where (select count(*)
         from table1
        where table1.date       = L.date
          and table1.color      = L.color
          and table1.supplier   = L.supplier
          and table1.variety    = 'dark'
          and table1.date between '01062012' and '30062012'
     )
     > 1

此外，确保您的表具有涵盖所有连接字段（品种、颜色、供应商、日期）的索引。

还有其他查找重复项的选项，例如 using ROW_NUMBER()，但我们需要更多地了解您的表结构（唯一 id 字段等）以及什么构成（和不构成）重复。

score 0 · Accepted Answer

虽然不确定您的表结构。似乎在极端负载下你的桌子被锁定了。尝试一些提示，如下所述。

从此尝试删除 * 并读取执行所需的记录。

select * from table1 as L WITH(NOLOCK)
where (select count(1) from table1 WITH(NOLOCK)
where L.date + L.time + L.color + L.supplier = table1.date +
table1.time + table1.color + table.supplier and table1.variety = 'dark' 
and date between '01062012' and '30062012') > 1

希望它能解决你的问题。

干杯

score 0 · Accepted Answer

我是新手，我的建议可能不合适，但如果在你的位置，我也会使用 SSIS。我会在 ssis 中使用脚本组件并执行操作

sql - sql server 即使在 1 小时后也无法执行重复查询..继续加载它

3 回答 3

Related

Reference