1

我有一个包含一些重复条目的表。我必须丢弃除一个之外的所有内容,然后更新这个最新的。我尝试过使用临时表和 while 语句,以这种方式:

CREATE TABLE #tmp_ImportedData_GenericData
(
    Id int identity(1,1),
    tmpCode varchar(255)  NULL,
    tmpAlpha3Code varchar(50)  NULL,
    tmpRelatedYear int NOT NULL,
    tmpPreviousValue varchar(255)  NULL,
    tmpGrowthRate varchar(255)  NULL
)

INSERT INTO #tmp_ImportedData_GenericData
SELECT
    MCS_ImportedData_GenericData.Code, 
MCS_ImportedData_GenericData.Alpha3Code,
MCS_ImportedData_GenericData.RelatedYear,
MCS_ImportedData_GenericData.PreviousValue,
MCS_ImportedData_GenericData.GrowthRate
FROM MCS_ImportedData_GenericData
INNER JOIN
(
    SELECT CODE, ALPHA3CODE, RELATEDYEAR, COUNT(*) AS NUMROWS
    FROM MCS_ImportedData_GenericData AS M
    GROUP BY M.CODE, M.ALPHA3CODE, M.RELATEDYEAR
    HAVING count(*) > 1
) AS M2 ON MCS_ImportedData_GenericData.CODE = M2.CODE
    AND MCS_ImportedData_GenericData.ALPHA3CODE = M2.ALPHA3CODE
    AND MCS_ImportedData_GenericData.RELATEDYEAR = M2.RELATEDYEAR
WHERE
(MCS_ImportedData_GenericData.PreviousValue <> 'INDEFINITO')

 -- SELECT * from #tmp_ImportedData_GenericData
 -- DROP TABLE #tmp_ImportedData_GenericData

DECLARE @counter int
DECLARE @rowsCount int

SET @counter = 1

SELECT @rowsCount =  count(*) from #tmp_ImportedData_GenericData
-- PRINT @rowsCount

WHILE @counter  < @rowsCount
BEGIN
    SELECT 
        @Code = tmpCode, 
        @Alpha3Code = tmpAlpha3Code, 
        @RelatedYear = tmpRelatedYear, 
        @OldValue = tmpPreviousValue, 
        @GrowthRate = tmpGrowthRate 
    FROM 
        #tmp_ImportedData_GenericData
    WHERE 
        Id = @counter

    DELETE FROM MCS_ImportedData_GenericData 
    WHERE 
        Code = @Code 
        AND Alpha3Code = @Alpha3Code  
        AND RelatedYear = @RelatedYear  
        AND PreviousValue <> 'INDEFINITO' OR PreviousValue IS NULL  

    UPDATE 
        MCS_ImportedData_GenericData 
        SET 
          PreviousValue = @OldValue, GrowthRate = @GrowthRate 
    WHERE 
        Code = @Code 
        AND Alpha3Code = @Alpha3Code  
        AND RelatedYear = @RelatedYear  
        AND MCS_ImportedData_GenericData.PreviousValue ='INDEFINITO'

    SET @counter = @counter + 1
END

但它需要很长时间,即使只有 20000 - 30000 行需要处理。

有没有人有一些建议以提高性能?

提前致谢!

4

3 回答 3

3
WITH q AS (
        SELECT  m.*, ROW_NUMBER() OVER (PARTITION BY CODE, ALPHA3CODE, RELATEDYEAR ORDER BY CASE WHEN PreviousValue = 'INDEFINITO' THEN 1 ELSE 0 END)
        FROM    MCS_ImportedData_GenericData m
        WHERE   PreviousValue <> 'INDEFINITO'
        )
DELETE
FROM    q
WHERE   rn > 1
于 2009-04-09T16:19:32.627 回答
1

Quassnoi 的答案使用 SQL Server 2005+ 语法,所以我认为我应该投入我的价值,使用更通用的东西......

首先,要删除所有重复项,但不删除“原始”,您需要一种区分重复记录的方法。(Quassnoi 答案的 ROW_NUMBER() 部分)

在您的情况下,源数据似乎没有标识列(您在临时表中创建一个)。如果是这样的话,我想到了两个选择:
1. 将标识列添加到数据中,然后删除重复项
2. 创建一个“重复数据删除”数据集,从原始数据中删除所有内容,然后将去重数据插入到原始数据中

选项 1 可能类似于...(使用新创建的 ID 字段)

DELETE
   [data]
FROM
   MCS_ImportedData_GenericData AS [data]
WHERE
   id > (
         SELECT
            MIN(id)
         FROM
            MCS_ImportedData_GenericData
         WHERE
            CODE = [data].CODE
            AND ALPHA3CODE = [data].ALPHA3CODE
            AND RELATEDYEAR = [data].RELATEDYEAR
        )

或者...

DELETE
   [data]
FROM
   MCS_ImportedData_GenericData AS [data]
INNER JOIN
(
   SELECT
      MIN(id) AS [id],
      CODE,
      ALPHA3CODE,
      RELATEDYEAR
   FROM
      MCS_ImportedData_GenericData
   GROUP BY
      CODE,
      ALPHA3CODE,
      RELATEDYEAR
)
AS [original]
   ON [original].CODE = [data].CODE
   AND [original].ALPHA3CODE = [data].ALPHA3CODE
   AND [original].RELATEDYEAR = [data].RELATEDYEAR
   AND [original].id <> [data].id
于 2009-04-09T18:12:31.497 回答
0

我对所用语法的理解不够完美,无法发布确切的答案,但这是一种方法。

确定要保留的行(例如,选择值,... from .. where ...)

在识别时执行更新逻辑(例如,选择值 + 1 ... from ... where ...)

将选择插入新表。

删除原始文件,将新名称重命名为原始文件,重新创建所有授权/同义词/触发器/索引/FKs/...(或截断原始文件并从新文件中插入选择)

显然这有相当大的开销,但如果你想更新/清除数百万行,这将是最快的方法。

于 2009-07-07T10:17:30.517 回答