1

我正在从 Excel 导入记录,我想避免重复。在 ASP Classic 中,我编写了一个检查数据库是否存在重复项的函数。如果找到一个,它会在用户名的末尾添加一个数字并再次检查用户名是否唯一,例如 petejones 变为 petejones1。不幸的是,这个脚本太慢了,因为数据库有大约 150k 条记录,而且搜索唯一性需要很长时间。有没有办法在 T-SQL 的 SQL Server 2008 中直接做同样的事情?所以整个过程会很快。有制作独特的过程吗?

这是经典 ASP 中的函数。我知道有更好的方法可以做到这一点,所以不要嘲笑我的脚本。

FUNCTION CreateUniqueUsername(str)
  SET DbConn = Server.CreateObject("ADODB.connection")
  DbConn.Open DSN_LINK
  nCounter = 0
  Unique = ""
  IF InStr(str, "@") > 0 THEN
     strUsername = Left(str,InStr(str, "@")-1)
  ELSE
     strUsername = str
  END IF
  strUsername = FormatUsername(strUsername)
  strSQL = "SELECT UserName FROM Member WHERE UserName = '" & strUsername & "';"
  SET rs = DbConn.Execute(strSQL)
  IF rs.EOF AND rs.BOF THEN
    nFinalUsername = strUsername
  ELSE
    DO UNTIL Unique = true
      nCounter = nCounter + 1
      nFinalUsername = strUsername & nCounter
      strSQL2 = "SELECT UserName FROM Member WHERE UserName = '" & nFinalUsername & " ' "
      SET objRS = DbConn.Execute(strSQL2)
      IF objRS.EOF THEN
        Unique = true
      ELSE
        intCount = intCount
      END IF
    LOOP
    objRS.Close
    SET objRS = Nothing 
  END IF
  rs.Close
  SET rs = Nothing 
  SET DbConn = Nothing
  CreateUniqueUsername = nFinalUsername
END FUNCTION

FUNCTION FormatUsername(str)
  Dim OutStr
  IF ISNULL(str) THEN EXIT FUNCTION
  OutStr = lCase(Trim(str))
  OutStr = Replace(OutStr, "’", "")
  OutStr = Replace(OutStr, "”", "")
  OutStr = Replace(OutStr, "'","")
  OutStr = Replace(OutStr, "&","and")
  OutStr = Replace(OutStr, "'", "")
  OutStr = Replace(OutStr, "*", "")
  OutStr = Replace(OutStr, ".", "")
  OutStr = Replace(OutStr, ",", "")
  OutStr = Replace(OutStr, CHR(34),"")
  OutStr = Replace(OutStr, " ","")
  OutStr = Replace(OutStr, "|","")
  OutStr = Replace(OutStr, "&","")
  OutStr = Replace(OutStr, "[","")
  OutStr = Replace(OutStr, ";", "")
  OutStr = Replace(OutStr, "]","")
  OutStr = Replace(OutStr, "(","")
  OutStr = Replace(OutStr, ")","")
  OutStr = Replace(OutStr, "{","")
  OutStr = Replace(OutStr, "}","")
  OutStr = Replace(OutStr, ":","")
  OutStr = Replace(OutStr, "/","")
  OutStr = Replace(OutStr, "\","")
  OutStr = Replace(OutStr, "?","")
  OutStr = Replace(OutStr, "@","")
  OutStr = Replace(OutStr, "!","")
  OutStr = Replace(OutStr, "_","")
  OutStr = Replace(OutStr, "''","")
  OutStr = Replace(OutStr, "%","")
  OutStr = Replace(OutStr, "#","")
  FormatUsername = OutStr
END FUNCTION

任何帮助将不胜感激,因为我仍在学习 SQL。

4

3 回答 3

3

您可以在 SQL 中执行此操作。这会寻找匹配的名称。如果找到匹配项,则它获取当前附加到它的最大数量并添加一个。所以最多它做两个SELECTS。当有很多重复时应该更快。

-- example table
declare @Member table(ID int identity, UserName varchar(80))
insert @Member values('Pete')
insert @Member values('Jill')
insert @Member values('Bob')
insert @Member values('Sam')
insert @Member values('Pete1')
insert @Member values('Pete2')
insert @Member values('Pete3')
insert @Member values('Bob1')


declare @UserName varchar(80), @FinalUserName varchar(80)
set @UserName = 'Pete'

set @FinalUserName = @UserName
if(exists(SELECT 1 FROM @Member WHERE left(UserName,len(@UserName)) = @UserName))
begin
    SELECT 
        @FinalUserName = @UserName + convert(varchar(12),max(substring(UserName,len(@UserName)+1,99)+1)) 
    FROM @Member 
    WHERE left(UserName,len(@UserName)) = @UserName
end

SELECT @FinalUserName 
于 2012-05-04T13:52:07.437 回答
1

这个繁琐的表达式将检索第一个可用的用户名。如果存在同名用户并且用户名的其余部分是数字,则表达式将返回用户名与下一个数字连接。如果找不到这样的用户名,则表达式将返回此用户名。

您可以将每个“@username”替换为实际值,或者更好地使用SqlCommand.ExecuteScalar。SqlCommand 将允许使用参数,这是更好的解决方案,因为您不必连接丑陋的字符串,并且它们会阻止使用Sql Injection

select @username 
     + isnull(convert (varchar (10),
         max (case when isnumeric (substring (m.Username, len (@username) + 1, 100)) = 1
                   then cast (substring (m.Username, len (@username) + 1, 100) as int) 
                   else (case when m.username = @username then 0 end)  
                   end) 
       + 1), '') UserName
from @members m
where m.username like @username + '%'

这是一个Sql Fiddle 测试场。替换set @username = 'aa'为其他用户名以查看结果。

于 2012-05-04T13:52:57.817 回答
0

这可以通过插入允许重复的临时表中来实现,然后从临时表转移到主表中,在此过程中解决重复。

INSERT INTO MainTable (Column1, Column2, UniqueName)
SELECT  Column1,
        Column2,
        UserName + ISNULL(CONVERT(VARCHAR, NULLIF(RowNumber, 0)), '') [UniqueName]
FROM    (   SELECT  *, *, ROW_NUMBER() OVER (PARTITION BY UserName ORDER BY Column1) - 1 [Rownumber]
            FROM    StagingTable
        ) staging

该声明的重要部分是:

ROW_NUMBER() OVER (PARTITION BY UserName ORDER BY Column1) - 1

这给每一行一个行号(显然)。PARTITION BY像 group by 一样工作,这基本上意味着当用户名更改时行数将重置为 1 。该ORDER BY部分确定哪个重复的用户名应该是第 1 行,应该是第 2 行等。ROW_NUMBER() 从 1 开始,所以我从中减去了 1,所以它从 0 开始。

接下来是将此行号与用户名组合的问题:

UserName + CONVERT(VARCHAR, RowNumber) [UniqueName]

这将产生用户名0、用户名1、用户名2...所以下一步是让“用户名0”只显示为用户名,给出用户名、用户名1、用户名2的列表:

UserName + ISNULL(CONVERT(VARCHAR, NULLIF(RowNumber, 0)), '') [UniqueName]

这基本上是说如果行号为 0 则变为 null,则如果 this 的结果为 null 则变为 ''。

于 2012-05-04T14:03:33.983 回答