sql - 生成唯一名称的性能问题

Question

我在 SQL Server DB 中有一个表“对象”。它包含对象的名称（字符串）。我有一个新对象的名称列表，需要在单独的表“NewObjects”中插入“对象”表中。此后，此操作将称为“导入”。

如果记录名称已经存在于“对象”中，我需要为要从“NewObjects”导入到“对象”的每条记录生成一个唯一名称。这个新名称将针对旧名称存储在“NewObjects”表中。

DECLARE @NewObjects TABLE
(
    ...
    Name varchar(20),
    newName nvarchar(20)
)

我已经实现了一个存储过程，它为要从“NewObjects”导入的每条记录生成唯一的名称。但是，我对 1000 条记录（在“NewObjects”中）的性能不满意。我需要帮助来优化我的代码。下面是实现：

PROCEDURE [dbo].[importWithNewNames] @args varchar(MAX)

-- Sample of @args is like 'A,B,C,D' (a CSV string)
...


DECLARE @NewObjects TABLE
(
    _index int identity PRIMARY KEY,
    Name varchar(20),
    newName nvarchar(20)
)

-- 'SplitString' function: this is a working implementation which is right now not concern of performance
INSERT INTO @NewObjects (Name)
SELECT * from SplitString(@args, ',')

declare @beg int = 1
declare @end int
DECLARE @oldName varchar(10)

-- get the count of the rows
select @end = MAX(_index) from @NewObjects

while @beg <= @end
BEGIN
    select @oldName = Name from @NewObjects where @beg = _index

    Declare @nameExists int = 0

    -- this is our constant. We cannot change
    DECLARE @MAX_NAME_WIDTH int = 5

    DECLARE @counter int = 1
    DECLARE @newName varchar(10)
    DECLARE @z varchar(10)

    select @nameExists = count(name) from Objects where name = @oldName
    ...
    IF @nameExists > 0
    BEGIN
        -- create name based on pattern 'Fxxxxx'. Example: 'F00001', 'F00002'.
        select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

        while EXISTS (select top 1 1 from Objects where name = @newName)
         OR EXISTS (select top 1 1 from @NewObjects where newName = @newName)
        BEGIN
            select @counter = @counter + 1
            select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
        END

        select top 1 @z = @newName from Objects

        update @NewObjects
        set newName = @z where @beg = _index
    END

    select @beg = @beg + 1
END

-- finally, show the new names generated
select * from @NewObjects

score 2 · Accepted Answer

免责声明：我无法测试这些建议，因此可能存在语法错误，您必须在实施它们时自行解决。它们在这里作为解决此过程的指南，还可以帮助您提高未来项目的技能。

一个只是略过的优化，当你迭代更大的集合时会变得更加普遍，这里的代码是：

select @nameExists = count(name) from Objects where name = @oldName
...
IF @nameExists > 0

考虑将其更改为：

IF EXISTS (select name from Objects where name = @oldName)

此外，而不是这样做：

-- create name based on pattern 'Fxxxxx'. Example: 'F00001', 'F00002'.
select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

while EXISTS (select top 1 1 from Objects where name = @newName)
 OR EXISTS (select top 1 1 from @NewObjects where newName = @newName)
BEGIN
    select @counter = @counter + 1
    select @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
END

考虑一下：

DECLARE @maxName VARCHAR(20)
SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

SELECT @maxName = MAX(name) FROM Objects WHERE name > @newName ORDER BY name
IF (@maxName IS NOT NULL)
BEGIN
    @counter = CAST(SUBSTRING(@maxName, 2) AS INT)
    SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
END

这将确保您不会为了找到生成名称的最大整数值而进行迭代和执行多个查询。

此外，根据我所掌握的少量上下文，您还应该能够进行另一项优化，以确保您只需执行上述一次，永远。

DECLARE @maxName VARCHAR(20)
SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')

IF (@beg = 1)
BEGIN
    SELECT @maxName = MAX(name) FROM Objects WHERE name > @newName ORDER BY name
    IF (@maxName IS NOT NULL)
    BEGIN
        @counter = CAST(SUBSTRING(@maxName, 2) AS INT)
        SET @newName = 'F' + REPLACE(STR(@counter, @MAX_NAME_WIDTH, 0), ' ', '0')
    END
END

我说您可以进行优化的原因是因为除非您必须担心其他实体在此期间插入看起来像您的记录（例如 Fxxxxx），否则您只需找到一次 MAX 并且可以简单地迭代 @counter在循环。

实际上，您实际上可以将整个部分拉出循环。你应该能够很容易地推断出来。只需将DECLAREandSET与. 但要一步一步来。@counterIF (@beg = 1)

另外，更改此行：

select top 1 @z = @newName from Objects

对此：

SET @z = @newName

因为您实际上是SET在对两个局部变量运行查询。这可能是性能问题的一个重要原因。一个好的做法是，除非您实际上是从SELECT语句中设置变量，否则请使用SET局部变量的操作。您的代码中还有其他一些适用的地方，请考虑这一行：

select @beg = @beg + 1

改用这个：

SET @beg = @beg + 1

最后，如上所述，关于简单地迭代 @counter，在你有这一行的循环结束时：

select @beg = @beg + 1

只需添加一行：

SET @counter = @counter + 1

你是金色的！

回顾一下，您可以一次收集最大冲突名称，这样您就可以摆脱所有这些迭代。您将开始使用SET来摆脱性能困扰的行，例如select top 1 @z = @newName from Objects您实际查询表以设置两个局部变量的位置。而且您将利用该EXISTS方法而不是设置一个利用AGGREGATE函数COUNT来完成这项工作的变量。

让我知道这些优化是如何工作的。

score 1 · Accepted Answer

您应该避免在循环内进行查询。特别是如果这是在表变量中...

您应该尝试使用临时表并在 newname 列上索引此表。我敢打赌它会提高一点性能..

但是最好你重写它，避免内部带有查询的那些循环..

设置我的环境进行测试...

    --this would be your object table... I feed it with some values for test
    DECLARE @Objects TABLE
    (
        _index int identity PRIMARY KEY,
        Name varchar(20)

    )
    insert into @Objects(name)
    values('A'),('A1'),('B'),('F00001')

    --the parameter of your procedure
    declare @args varchar(MAX)
    set @args = 'A,B,C,D,F00001'

    --@NewObjects2 is your @NewObjects just named the n2 cause I did run your solution together when testing

    DECLARE @NewObjects2 TABLE
    (
        _index int identity PRIMARY KEY,
        Name varchar(20),
        newName nvarchar(20)
    )

    INSERT INTO @NewObjects2 (Name)
    SELECT * from SplitString(@args, ',')

    declare @end int
    select @end = MAX(_index) from @NewObjects2
    DECLARE @MAX_NAME_WIDTH int = 5

此时它与您的解决方案非常相似

现在我会做什么而不是你的循环

--generate newNames in format FXXXXX with free names sufficient to give newnames for all lines in @newObject
--you should alter this to get the greater FXXXXX name inside the Objects and start generate newNames from this point.. to avoid overhead creating newNames that will sure not to be used..
with N_free as 
(
     select 
         0 as [count],
         'F' + REPLACE(STR(0, @MAX_NAME_WIDTH, 0), ' ', '0') as [newName],
         0 as fl_free,
         0 as count_free

     union all 

     select 
         N.[count] + 1 as [count],
         'F' + REPLACE(STR(N.[count]+1, @MAX_NAME_WIDTH, 0), ' ', '0') as [newName],
         OA.fl_free,
         count_free + OA.fl_free as count_free
     from 
         N_free N
     outer apply 
         (select 
              case 
                 when not exists(select name from @Objects
                                 where Name = 'F' + REPLACE(STR(N.[count]+1, @MAX_NAME_WIDTH, 0), ' ', '0')) 
                    then 1 
                 else 0 
              end as fl_free) OA
    where 
        N.count_free < @end
)
--return only those newNames that are free to be used
    ,newNames as (select  ROW_NUMBER() over (order by [count]) as _index_name
                         ,[newName] 
                  from N_free where fl_free = 1
    )
--update the @NewObjects2 giving newname for the ones that got the name already been used on Objects
    update N2
    set newName = V2.[newName]
    from @NewObjects2 N2
    inner join (select V._index,V.Name,newNames.[newName]
                from(   select row_number() over (partition by case when O.Name is not null 
                                                                        then 1
                                                                        else 0
                                                        end 
                                                        order by N._index) as _index_name
                                  ,N._index
                                  ,N.Name
                                  ,case when O.Name is not null 
                                        then 1
                                        else 0
                                    end as [fl_need_newName]
                            from @NewObjects2 N
                            left outer join @Objects O
                            on O.Name = N.Name
                    )V
                    left outer join newNames 
                    on newNames._index_name = V._index_name
                    and V.fl_need_newName = 1
    )V2
    on V2._index = N2._index
            option(MAXRECURSION 0)

    select * from @NewObjects2

我获得的结果与使用您针对这种环境的解决方案的结果相同......

您可以检查这是否真的产生相同的结果......

此查询的结果是

    _index  Name    newName
        1   A       F00002
        2   B       F00003
        3   C       NULL
        4   D       NULL
        5   F00001  F00004

sql - 生成唯一名称的性能问题

2 回答 2

Related

Reference