一个用于循环遍历1700 万条记录以删除重复项的查询 现在已经运行了大约16 个小时,我想知道查询是否现在停止,它是否会完成删除语句,或者它是否在运行时被删除询问?事实上,如果我停止它,它会完成删除还是回滚?
我发现当我做一个
select count(*) from myTable
它返回的行(在执行此查询时)大约比起始行数少 5。显然服务器资源极差,这是否意味着这个过程需要 16 个小时才能找到 5 个重复项(实际上有数千个),而且这可能会运行数天?
这个查询在 2000 行测试数据上花费了 6 秒,并且在该组数据上效果很好,所以我认为完整的数据集需要 15 个小时。
有任何想法吗?
以下是查询:
--Declare the looping variable
DECLARE @LoopVar char(10)
DECLARE
--Set private variables that will be used throughout
@long DECIMAL,
@lat DECIMAL,
@phoneNumber char(10),
@businessname varchar(64),
@winner char(10)
SET @LoopVar = (SELECT MIN(RecordID) FROM MyTable)
WHILE @LoopVar is not null
BEGIN
--initialize the private variables (essentially this is a .ctor)
SELECT
@long = null,
@lat = null,
@businessname = null,
@phoneNumber = null,
@winner = null
-- load data from the row declared when setting @LoopVar
SELECT
@long = longitude,
@lat = latitude,
@businessname = BusinessName,
@phoneNumber = Phone
FROM MyTable
WHERE RecordID = @LoopVar
--find the winning row with that data. The winning row means
SELECT top 1 @Winner = RecordID
FROM MyTable
WHERE @long = longitude
AND @lat = latitude
AND @businessname = BusinessName
AND @phoneNumber = Phone
ORDER BY
CASE WHEN webAddress is not null THEN 1 ELSE 2 END,
CASE WHEN caption1 is not null THEN 1 ELSE 2 END,
CASE WHEN caption2 is not null THEN 1 ELSE 2 END,
RecordID
--delete any losers.
DELETE FROM MyTable
WHERE @long = longitude
AND @lat = latitude
AND @businessname = BusinessName
AND @phoneNumber = Phone
AND @winner != RecordID
-- prep the next loop value to go ahead and perform the next duplicate query.
SET @LoopVar = (SELECT MIN(RecordID)
FROM MyTable
WHERE @LoopVar < RecordID)
END