我们有一个想要向潜在客户演示的网络应用程序,但我们这样做的最佳方式是使用现有数据,以获得完整的体验。当然,我们不希望使用应用程序中可见的实际客户姓名或地址等来执行此操作。SQL Server 中是否有一种简单的方法来随机化或打乱 varchar 或文本字段?
无论如何,这些列都不是键,无论是主键还是外键。
我们有一个想要向潜在客户演示的网络应用程序,但我们这样做的最佳方式是使用现有数据,以获得完整的体验。当然,我们不希望使用应用程序中可见的实际客户姓名或地址等来执行此操作。SQL Server 中是否有一种简单的方法来随机化或打乱 varchar 或文本字段?
无论如何,这些列都不是键,无论是主键还是外键。
这是一个迟到的答案,但我对有关此事的任何互联网搜索都不满意。这是一个示例,它将在客户表中打乱名字和姓氏以创建新名称:
--Replace Customers with your table name
select * from Customers
--Be sure int match your id column datatype
Declare @id int
--Add a WHERE here to select just a subset of your table
DECLARE mycursor CURSOR FOR SELECT id FROM Customers
OPEN mycursor
FETCH NEXT FROM mycursor INTO @id;
WHILE (@@FETCH_STATUS = 0)
BEGIN
--We loop
--Warning: NEWID() is generated once per query, so we update the fullname in two queries.
UPDATE Customers
SET FirstName = (SELECT TOP 1 FirstName FROM Customers ORDER BY NEWID())
WHERE id = @id
UPDATE Customers
SET LastName = (SELECT TOP 1 LastName FROM Customers ORDER BY NEWID())
WHERE id = @id
FETCH NEXT FROM mycursor INTO @id;
END
CLOSE mycursor;
DEALLOCATE mycursor;
select * from Customers
Redgate has tool for it: http://www.red-gate.com/products/SQL_Data_Generator/index.htm
Didn't use it, but redgate tools are very good.
EDIT
It generates data, not scrambles, but still can be useful.
我通过更改字段中的字母来打乱数据一次。所以,如果你有一个名字“Mike Smith”,你把所有的 i 都改成 o,m 改成 l,e 改成 a,s 改成 t,t 改成 rr,你最终会得到
Moke Smoth
Loke Sloth
Loka Sloth
Loka Tloth
Loka Rrlorrh
这足以使名字不可读,而且你不能回去确定它是什么(我改变了一些已经改变了字母的字母。)但是,它仍然是可读的。
不可能只将您的数据留在表格中并以某种方式仅以加扰的形式显示它。
您的选择是通过以某种方式对其进行加扰来替换数据,生成具有相同通用形式的新数据,编写一个函数(CLR 或 T-SQL)将其作为您使用的查询的一部分进行加扰,或加密数据, 在这种情况下,只有在用户也有适当的解密密钥时才能显示。
如果您决定替换数据,除了前面提到的 Red Gate 工具外,您还可以考虑使用 Visual Studio Team Database 附带的数据生成器,或者可能是 Integration Services。如果您将从更复杂的转换中受益,后者可能特别有用。
dbForge 有一个免费的数据生成工具:http: //www.devart.com/dbforge/sql/data-generator/
这里有几个简单的方法,它们具有很好的性能并且可以应用于表:
use master;
declare @length as int = 50; --acts as maximum length for random length expressions
declare @rows as int = 10;
SELECT
CONVERT( VARCHAR(max), crypt_gen_random( @length )) as FixedLengthText
, CONVERT(NVARCHAR(max), crypt_gen_random( @length * 2 )) as FixedLengthUnicode
, ( select crypt_gen_random((@length/8*6))
where value."type" is not null --refer to outer query, to get different value for each row
FOR XML PATH('')) as FixedLengthBase64
, CONVERT( VARCHAR(max), crypt_gen_random( (ABS(CHECKSUM(NewId())) % @length )+1 )) as RandomLengthText
, CONVERT(NVARCHAR(max), crypt_gen_random( (ABS(CHECKSUM(NewId())) % (@length * 2))+1 )) as RandomLengthUnicode
, ( select crypt_gen_random( ( (ABS(CHECKSUM(NewId())) % @length )+1 )/8*6 )
where value."type" is not null --refer to outer query, to get different value for each row
FOR XML PATH('')) as RandomLengthBase64
FROM dbo.spt_values AS value
WHERE value."type" = 'P' --Limit "number" to integers between 0-2047
and value.number <= @rows
;
您可以创建需要更新的列的列表,然后简单地遍历所述列表并执行一些动态 sql,以某种方式更新行。我做了一个相当基本的加扰函数,它只会 sha1 数据(使用随机盐),因此它对于大多数用途来说应该足够安全。
if exists (select 1 where object_id('tempdb..#columnsToUpdate') is not null)
begin
drop table #columnsToUpdate
end
create table #columnsToUpdate(tableName varchar(max), columnName varchar(max), max_length int)
if exists (select 1 where object_id('fnGetSanitizedName') is not null)
begin
drop function fnGetSanitizedName
end
if exists (select 1 where object_id('random') is not null)
begin
drop view random
end
if exists (select 1 where object_id('randUniform') is not null)
begin
drop function randUniform
end
GO
create view random(value) as select rand();
go
create function dbo.randUniform() returns real
begin
declare @v real
set @v = (select value from random)
return @v
end
go
CREATE FUNCTION dbo.fnGetSanitizedName
(
@functionName nvarchar(max),
@length int
)
RETURNS varchar(max)
AS
BEGIN
return left(SUBSTRING(master.dbo.fn_varbintohexstr(HashBytes('SHA1', cast(cast(cast(dbo.randUniform() * 10000 as int) as varchar(8)) as varchar(40)) + @functionName)), 3, 32), @length)
END
GO
begin transaction
set nocount on
insert into #columnsToUpdate
select tables.name, columns.name,
case
when types.name = 'nvarchar' then columns.max_length / 2
else columns.max_length
end as max_length
from sys.tables tables
inner join sys.columns columns on tables.object_id=columns.object_id
inner join sys.types types on columns.system_type_id = types.system_type_id
where types.name in ('nvarchar', 'varchar')
declare @tableName varchar(max)
declare @columnName varchar(max)
declare @length int
declare @executingSql varchar(max)
declare tableUpdateCursor cursor
for select tableName, columnName, max_length from #columnsToUpdate
open tableUpdateCursor
fetch next from tableUpdateCursor into @tableName, @columnName, @length
while @@fetch_status = 0
begin
set @executingSql = 'update ' + @tableName + ' set ' + @columnName + ' = dbo.fnGetSanitizedName(' + @columnName + ',' + cast(@length as varchar(max)) + ')'
print @executingSql
exec(@executingSql)
fetch next from tableUpdateCursor into @tableName, @columnName, @length
end
close tableUpdateCursor
deallocate tableUpdateCursor
set nocount off
rollback -- Can remove the rollback when you are sure about what your are doing.
drop table #columnsToUpdate
drop function dbo.fnGetSanitizedName
drop view random
drop function randUniform