sql-server - 典型发布上下文中的 T-SQL MERGE 性能

Question

我的情况是，“发布者”应用程序通过查询非常复杂的视图，然后使用单独的插入、更新和删除操作将结果合并到非规范化视图模型表中，从而使视图模型保持最新。

现在我们已经升级到 SQL 2008，我认为现在是使用 SQL MERGE 语句更新这些的好时机。然而，在编写查询之后，MERGE 语句的子树成本是 1214.54！用旧方法，插入/更新/删除的总和只有 0.104！

我无法弄清楚描述相同确切操作的更直接的方式怎么会如此糟糕。也许你可以看到我看不到的错误。

表上的一些统计信息：它有 190 万行，每个 MERGE 操作插入、更新或删除超过 100 行。在我的测试用例中，只有 1 个受到影响。

-- This table variable has the EXACT same structure as the published table
-- Yes, I've tried a temp table instead of a table variable, and it makes no difference
declare @tSource table
(
    Key1 uniqueidentifier NOT NULL,
    Key2 int NOT NULL,
    Data1 datetime NOT NULL,
    Data2 datetime,
    Data3 varchar(255) NOT NULL, 
    PRIMARY KEY 
    (
        Key1, 
        Key2
    )
)

-- Fill the temp table with the desired current state of the view model, for
-- only those rows affected by @Key1.  I'm not really concerned about the
-- performance of this.  The result of this; it's already good.  This results
-- in very few rows in the table var, in fact, only 1 in my test case
insert into @tSource
select *
from vw_Source_View with (nolock)
where Key1 = @Key1

-- Now it's time to merge @tSource into TargetTable

;MERGE TargetTable as T
USING tSource S
    on S.Key1 = T.Key1 and S.Key2 = T.Key2

-- Only update if the Data columns do not match
WHEN MATCHED AND T.Data1 <> S.Data1 OR T.Data2 <> S.Data2 OR T.Data3 <> S.Data3 THEN
    UPDATE SET
        T.Data1 = S.Data1,
        T.Data2 = S.Data2,
        T.Data3 = S.Data3

-- Insert when missing in the target
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Key1, Key2, Data1, Data2, Data3)
    VALUES (Key1, Key2, Data1, Data2, Data3)

-- Delete when missing in the source, being careful not to delete the REST
-- of the table by applying the T.Key1 = @id condition
WHEN NOT MATCHED BY SOURCE AND T.Key1 = @id THEN
    DELETE
;

那么这如何达到 1200 个子树的成本呢？从表本身访问数据似乎非常有效。事实上，MERGE 87% 的成本似乎来自链末端附近的排序操作：

合并 (0%) <- 索引更新 (12%) <- 排序 (87%) <- (...)

这种排序有 0 行进出它。为什么排序 0 行需要 87% 的资源？

更新

我在 Gist 中发布了仅 MERGE 操作的实际（非估计）执行计划。

score 2 · Accepted Answer

子树成本应该用大量的盐来计算（尤其是当你有很大的基数错误时）。SET STATISTICS IO ON; SET STATISTICS TIME ON;产出是实际绩效的更好指标。

零行排序不占用 87% 的资源。您计划中的这个问题是统计估计之一。实际计划中显示的成本仍然是估计成本。它不会调整它们以考虑实际发生的事情。

计划中有一个点，过滤器将 1,911,721 行减少到 0，但估计的行数为 1,860,310。此后，所有成本都是虚假的，最终导致 87% 的成本估计为 3,348,560 行排序。

Merge通过查看Full Outer Join具有等效谓词的估计计划（给出相同的 1,860,310 行估计），可以在语句之外重现基数估计错误。

SELECT * 
FROM TargetTable T
FULL OUTER JOIN  @tSource S
    ON S.Key1 = T.Key1 and S.Key2 = T.Key2
WHERE 
CASE WHEN S.Key1 IS NOT NULL 
     /*Matched by Source*/
     THEN CASE WHEN T.Key1 IS NOT NULL  
               /*Matched by Target*/
               THEN CASE WHEN  [T].[Data1]<>S.[Data1] OR 
                               [T].[Data2]<>S.[Data2] OR 
                               [T].[Data3]<>S.[Data3]
                         THEN (1) 
                     END 
                /*Not Matched by Target*/     
                ELSE (4) 
           END 
       /*Not Matched by Source*/     
      ELSE CASE WHEN  [T].[Key1]=@id 
                THEN (3) 
            END 
END IS NOT NULL

话虽如此，但过滤器本身的计划看起来确实不太理想。当您可能需要一个具有 2 个聚集索引范围搜索的计划时，它正在执行完整的聚集索引扫描。一个用于从源连接中检索与主键匹配的单行，另一个用于检索T.Key1 = @id范围（尽管这可能是为了避免以后需要排序为聚集键顺序？）

原计划

也许你可以试试这个重写，看看它是好是坏

;WITH FilteredTarget AS
(
SELECT T.*
FROM TargetTable  AS T WITH (FORCESEEK)
JOIN @tSource S
    ON (T.Key1 = S.Key1
    AND S.Key2 = T.Key2)
    OR T.Key1 = @id
)
MERGE FilteredTarget AS T
USING @tSource S
ON (T.Key1 = S.Key1
   AND S.Key2 = T.Key2)


-- Only update if the Data columns do not match
WHEN MATCHED AND S.Key1 = T.Key1 AND S.Key2 = T.Key2 AND 
                                         (T.Data1 <> S.Data1 OR
                                          T.Data2 <> S.Data2 OR 
                                          T.Data3 <> S.Data3) THEN
  UPDATE SET T.Data1 = S.Data1,
             T.Data2 = S.Data2,
             T.Data3 = S.Data3

-- Note from original poster: This extra "safety clause" turned out not to
-- affect the behavior or the execution plan, so I removed it and it works
-- just as well without, but if you find yourself in a similar situation
-- you might want to give it a try.
-- WHEN MATCHED AND (S.Key1 <> T.Key1 OR S.Key2 <> T.Key2) AND T.Key1 = @id THEN
--   DELETE

-- Insert when missing in the target
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Key1, Key2, Data1, Data2, Data3)
    VALUES (Key1, Key2, Data1, Data2, Data3)

WHEN NOT MATCHED BY SOURCE AND T.Key1 = @id THEN
    DELETE;

sql-server - 典型发布上下文中的 T-SQL MERGE 性能

1 回答 1

Related

Reference