1

使用下表定义:

CREATE TABLE dbo.Test 
(
    [ID] INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
    [Code] CHAR(10) NOT NULL UNIQUE NONCLUSTERED
);

为了对数据集进行去标识化,我需要随机化代码值。

我想保留 ID 值,以便其他表也不需要更改。

我尝试了以下方法:

SELECT t.ID, c.Code
FROM dbo.Test AS t
CROSS APPLY (SELECT TOP(1) v.Code
             FROM dbo.Test AS v
             WHERE v.Code <> t.Code
             ORDER BY NEWID()) AS c

问题是相同的唯一键值将分配给多个记录。

关于如何解决这个问题的任何建议?

4

1 回答 1

1

这是我的做法:

-- Sample data
DECLARE @test TABLE (ID INT IDENTITY, [Code] CHAR(10));
INSERT @test
VALUES (REPLICATE('A',10)),(REPLICATE('B',10)),(REPLICATE('C',10)),(REPLICATE('D',10));

-- Solution
SELECT t.ID, c.[Code]
FROM   @test AS t
JOIN   
(
  SELECT SK = ROW_NUMBER() OVER (ORDER BY NEWID()), t.[Code]
  FROM   @test AS t
) AS c ON c.SK = t.ID;

回报:

ID          Code
----------- ----------
1           DDDDDDDDDD
2           AAAAAAAAAA
3           BBBBBBBBBB
4           CCCCCCCCCC

(但您的结果可能会有所不同 - 这就是重点)

快速更新:

请注意,如果您的 ID 不是连续的(例如,某些 ID 已被删除),那么您可以考虑:

WITH T1 AS (SELECT t.ID, SK = ROW_NUMBER() OVER (ORDER BY t.ID) FROM @test AS t),
     T2 AS (SELECT SK = ROW_NUMBER() OVER (ORDER BY NEWID()), t.[Code] FROM @test AS t)
SELECT t1.ID, t2.Code
FROM T1
JOIN T2 ON T1.SK = T2.SK;
于 2019-09-16T17:23:02.813 回答