0

我正在尝试找到一种在 SQL Server 中查找重复项的更好方法。在结果开始显示在 SSMS 的结果窗口中之前,这需要 20 多分钟才能运行超过 3 亿条记录。又过了 22 分钟,它才坠毁。

然后 SSMS 在显示 16,777,216 条记录后抛出此错误:

An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown.

架构:

ENCOUNTER_NUM - numeric(22,0)
CONCEPT_CD - varchar(50)
PROVIDER_ID - varchar(50)
START_DATE - datetime
MODIFIER_CD - varchar(100)
INSTANCE_NUM - numeric(18,0)


SELECT
    ROW_NUMBER() OVER (ORDER BY f1.[ENCOUNTER_NUM],f1.[CONCEPT_CD],f1.[PROVIDER_ID],f1.[START_DATE],f1.[MODIFIER_CD],f1.[INSTANCE_NUM]),
    f1.[ENCOUNTER_NUM], 
    f1.[CONCEPT_CD], 
    f1.[PROVIDER_ID], 
    f1.[START_DATE], 
    f1.[MODIFIER_CD], 
    f1.[INSTANCE_NUM]
FROM
    [dbo].[I2B2_OBSERVATION_FACT] f1
    INNER JOIN [dbo].[I2B2_OBSERVATION_FACT] f2 ON
        f1.[ENCOUNTER_NUM] = f2.[ENCOUNTER_NUM] 
        AND f1.[CONCEPT_CD] = f2.[CONCEPT_CD]
        AND f1.[PROVIDER_ID] = f2.[PROVIDER_ID]
        AND f1.[START_DATE] = f2.[START_DATE]
        AND f1.[MODIFIER_CD] = f2.[MODIFIER_CD]
        AND f1.[INSTANCE_NUM] = f2.[INSTANCE_NUM]
4

1 回答 1

8

不知道这有多快,但值得一试。

SELECT
    COUNT(*) AS Dupes,
    f1.[ENCOUNTER_NUM], 
    f1.[CONCEPT_CD], 
    f1.[PROVIDER_ID], 
    f1.[START_DATE], 
    f1.[MODIFIER_CD], 
    f1.[INSTANCE_NUM]
FROM
    [dbo].[I2B2_OBSERVATION_FACT] f1
GROUP BY
    f1.[ENCOUNTER_NUM], 
    f1.[CONCEPT_CD], 
    f1.[PROVIDER_ID], 
    f1.[START_DATE], 
    f1.[MODIFIER_CD], 
    f1.[INSTANCE_NUM]
HAVING
    COUNT(*) > 1
于 2013-05-02T17:56:56.117 回答