sql - 需要将递归 CTE 查询转换为索引友好查询

Question

在完成编写递归 CTE 查询以满足我的需要的所有艰苦工作之后，我意识到我不能使用它，因为它在索引视图中不起作用。所以我需要别的东西来代替下面的 CTE。（是的，您可以在非索引视图中使用 CTE，但这对我来说太慢了）。

要求：

我的最终目标是拥有一个自我更新的索引视图（它不必是一个视图，而是类似的东西）......也就是说，如果视图连接的任何表中的数据发生变化，那么视图需要更新自己。
视图需要被索引，因为它必须非常快，并且数据不会非常频繁地更改。不幸的是，使用 CTE 的非索引视图需要 3-5 秒才能运行，这对我的需求来说太长了。我需要查询在几毫秒内运行。递归表中有几十万条记录。

就我的研究而言，满足所有这些要求的最佳解决方案是索引视图，但我对任何解决方案都持开放态度。

CTE 可以在我的另一篇文章的答案中找到。或者这里又是：

DECLARE @tbl TABLE ( 
     Id INT 
    ,[Name] VARCHAR(20) 
    ,ParentId INT 
    ) 

INSERT INTO @tbl( Id, Name, ParentId ) 
VALUES 
 (1, 'Europe', NULL) 
,(2, 'Asia',   NULL) 
,(3, 'Germany', 1) 
,(4, 'UK',      1) 
,(5, 'China',   2) 
,(6, 'India',   2) 
,(7, 'Scotland', 4) 
,(8, 'Edinburgh', 7) 
,(9, 'Leith', 8) 

; 
DECLARE @tbl2 table (id int, abbreviation varchar(10), tbl_id int)
INSERT INTO @tbl2( Id, Abbreviation, tbl_id ) 
VALUES 
 (100, 'EU', 1) 
,(101, 'AS', 2) 
,(102, 'DE', 3) 
,(103, 'CN', 5)

;WITH abbr AS (
    SELECT a.*, isnull(b.abbreviation,'') abbreviation
    FROM @tbl a
    left join @tbl2 b on a.Id = b.tbl_id
), abcd AS ( 
          -- anchor 
        SELECT  id, [Name], ParentID,
                CAST(([Name]) AS VARCHAR(1000)) [Path],
                cast(abbreviation as varchar(max)) abbreviation
        FROM    abbr
        WHERE   ParentId IS NULL 
        UNION ALL
          --recursive member 
        SELECT  t.id, t.[Name], t.ParentID, 
                CAST((a.path + '/' + t.Name) AS VARCHAR(1000)) [Path],
                isnull(nullif(t.abbreviation,'')+',', '') + a.abbreviation
        FROM    abbr AS t 
                JOIN abcd AS a 
                  ON t.ParentId = a.id 
       )
SELECT *, [Path] + ':' + abbreviation
FROM abcd

score 3 · Accepted Answer

在使用索引视图（自连接、cte、udf 访问数据等）遇到所有障碍之后，我建议以下内容作为您的解决方案。

创建支持函数

基于从根开始的最大深度 4（总共 5 个）。或使用 CTE

CREATE FUNCTION dbo.GetHierPath(@hier_id int) returns varchar(max)
WITH SCHEMABINDING
as
begin
return (
    select FullPath =
               isnull(H5.Name+'/','') + 
               isnull(H4.Name+'/','') +
               isnull(H3.Name+'/','') +
               isnull(H2.Name+'/','') +
               H1.Name
             +
               ':'
             +
               isnull(STUFF(
               isnull(','+A1.abbreviation,'') +
               isnull(','+A2.abbreviation,'') + 
               isnull(','+A3.abbreviation,'') +
               isnull(','+A4.abbreviation,'') +
               isnull(','+A5.abbreviation,''),1,1,''),'')
    from dbo.HIER H1
    left join dbo.ABBR A1 on A1.hier_id = H1.Id
    left join dbo.HIER H2 on H1.ParentId = H2.Id
    left join dbo.ABBR A2 on A2.hier_id = H2.Id
    left join dbo.HIER H3 on H2.ParentId = H3.Id
    left join dbo.ABBR A3 on A3.hier_id = H3.Id
    left join dbo.HIER H4 on H3.ParentId = H4.Id
    left join dbo.ABBR A4 on A4.hier_id = H4.Id
    left join dbo.HIER H5 on H4.ParentId = H5.Id
    left join dbo.ABBR A5 on A5.hier_id = H5.Id
    where H1.id = @hier_id)
end
GO

将列添加到表本身

For example the fullpath column, if you need, add the other 2 columns in the CTE by splitting the result of dbo.GetHierPath on ':' (left=>path, right=>abbreviations)

-- index maximum key length is 900, based on your data, 400 is enough
ALTER TABLE HIER ADD FullPath VARCHAR(400)

Maintain the columns

Because of the hierarchical nature, record X could be deleted that affects a Y descendent and Z ancestor, which is quite hard to identify in either of INSTEAD OF or AFTER triggers. So the alternative approach is based on the conditions

if data changes in any of the tables the view joins on, then the view needs to update itself.
the non-indexed view using a CTE takes 3-5 seconds to run which is way too long for my needs

We maintain the data simply by running through the entire table again, taking 3-5 seconds per update (or faster if the 5-join query works out better).

CREATE TRIGGER TG_HIER
ON HIER
AFTER INSERT, UPDATE, DELETE
AS
UPDATE HIER
SET FullPath = dbo.GetHierPath(HIER.Id)

Finally, index the new column(s) on the table itself

create index ix_hier_fullpath on HIER(FullPath)

_{If you intended to access the path data via the id, then it is already in the table itself without adding an additional index.}

The above TSQL references these objects

Modify the table and column names to suit your schema.

CREATE TABLE dbo.HIER (Id INT Primary Key Clustered, [Name] VARCHAR(20) ,ParentId INT)
;
INSERT dbo.HIER( Id, Name, ParentId ) VALUES 
     (1, 'Europe', NULL) 
    ,(2, 'Asia',   NULL) 
    ,(3, 'Germany', 1) 
    ,(4, 'UK',      1) 
    ,(5, 'China',   2) 
    ,(6, 'India',   2) 
    ,(7, 'Scotland', 4) 
    ,(8, 'Edinburgh', 7) 
    ,(9, 'Leith', 8)
    ,(10, 'Antartica', NULL) 
; 
CREATE TABLE dbo.ABBR (id int primary key clustered, abbreviation varchar(10), hier_id int)
;
INSERT dbo.ABBR( Id, Abbreviation, hier_id ) VALUES 
     (100, 'EU', 1) 
    ,(101, 'AS', 2) 
    ,(102, 'DE', 3) 
    ,(103, 'CN', 5)
GO

EDIT - Possibly faster alternative

Given that all records are recalculated each time, there is no real need for a function that returns the FullPath for a single HIER.ID. The query in the support function can be used without the where H1.id = @hier_id filter at the end. Furthermore, the expression for FullPath can be broken into PathOnly and Abbreviation easily down the middle. Or just use the original CTE, whichever is faster.