sql - 如何在 SQL Server 中有效地合并两个层次结构？

Question

我有两个带有 hierarchyid 字段的表，其中一个是一个临时表，其中包含需要合并到另一个中的新数据（即需要添加到主树的一组节点，其中一些可能已经是那里）。

除了定义树结构（父/子关系）的 hierarchyid 列。每个表都有一个单独的列，其中包含唯一标识每个节点的节点标识符。也就是说，判断临时表中的节点是否已经在主表中的方法是通过节点 ID，而不是通过 hierarchyid 列。

当务之急，需要执行的处理看起来像这样：

For each row, RS, in the staging table:
    If there is not already a row with the same Id as RS in the main table:
         Find the parent, PS, of the staging row
         Find the row, PM, in the main table that has the same node ID as PS
         Create a new child, RM of row PM
         Set PM's ID equal to the ID of RS

重要的是，这种方法只有在暂存表中的树以广度优先顺序排序/遍历时才有效——这样当遇到 RS 时，可以保证其父 PS 在主表中已经有相应的行。

到目前为止，我可以看到在 SQL Server 中实现此目的的唯一方法是在暂存表（已经排序）上使用游标，并为每一行调用一个存储过程，该过程基本上完全按照上述方式完成，并带有 SELECT MAX() 查找已经作为 PM 的子级存在的最高层次结构 ID，以便可以唯一地添加子级。

不过，这是一种非常低效的方法，而且对于我的目的来说太慢了。有没有更好的办法？

作为背景，这是我正在做的一种可行性检查。我需要弄清楚我是否可以在 SQL Server 中快速执行此操作。如果事实证明我不能，我将不得不在数据库之外以另一种方式进行。树的合并是问题域所固有的（实际上，在某种意义上是），因此以不同的方式构造数据或采取更广泛的观点并试图以某种方式完全避免执行此操作不是一种选择。

更新

根据要求，这是一个具体的例子。

表“staging”和“main”都有相同的两列：

   hierarchy_id of type hierarchyid
   node_id of type bigint

初始内容

主要的：

 hierarchy_id    node_id
 /1/             1
 /1/1/           2
 /1/2/           3
 /1/3/           4

分期：

 hierarchy_id    node_id
 /1/             1
 /1/1/           3
 /1/2/           5
 /1/1/1/         6

所需内容

主要的：

 hierarchy_id    node_id
 /1/             1
 /1/1/           2
 /1/2/           3
 /1/3/           4
 /1/4/           5
 /1/2/1/         6

请注意，暂存表中具有hierarchy_id /1/1/ 的节点对应于目标表中具有hiearchy_id /1/2/ 的节点（这就是node_id 很重要的原因——不能只复制hierarchy_id 值）。另请注意，将 node_id 为 6 的新节点添加为正确父节点的子节点，即 node_id 为 3 的节点，这就是 hierarchy_id 很重要的原因 - 它定义了任何新节点的树结构（父/子关系）。任何解决方案都需要考虑这两个方面。

score 3 · Accepted Answer

我们一直在研究需要类似解决方案的产品。在对这种方法和其他方法进行了大量研究之后，我们得出结论，hierarchyID 方法不适合我们。

因此，作为对您问题的直接回答：使用这种方法没有更好的方法来做到这一点。

查看Nested Set Models和Adjacency List Model。

对于这一特定的设计挑战，这两者都是更加优雅和有效的解决方案。

编辑： 作为一个事后的想法，如果你没有嫁给 SQL - 这个问题可以使用非关系数据库更好地解决。我们不能那样做，因为没有人在设计非关系数据库方面有足够的专业知识，但如果 SQL 是可选的，那么您可以在 MongoDB 中以更好、更有效的方式使用当前方法。

score 3 · Accepted Answer

以这种方式对层次结构建模会导致问题。hierarchy_id 列违反了第一范式，如果您不序列化/瓶颈访问，合并过程将容易出现更新异常。

您应该考虑一个只有 node_id 和 parent_id 的表，看看它如何简化您的合并问题

node_id   parent_id
1         NULL
2         1
3         2
4         3

node_id   parent_id
1         NULL
3         1
5         2
6         1

您将对此使用递归查询，您可能会惊讶于执行计划的效率。如果您必须拥有扁平的层次结构列，您可能可以使用递归查询创建索引视图。

score 0 · Accepted Answer

这是一个解决方案，它一次将行从源移动@S到目标@T一个级别。为了简化一点，我添加了一个根节点，只是为了在创建新的 HierarcyID 时始终存在一个父节点。

我从未使用过 HierarchyID，所以绝对有更有效的方法来做到这一点，但它至少应该比一次做一行更有效率。

-- Target table
declare @T table 
(
  hierarchy_id hierarchyid primary key,
  node_id bigint
)

insert into @T values
('/',             0), -- Needed for simplicity
('/1/',           1),
('/1/1/',         2),
('/1/2/',         3),
('/1/3/',         4)

-- Source table
declare @S table 
(
  hierarchy_id hierarchyid primary key,
  node_id bigint
)

insert into @S values
('/',               0),
('/1/',             1),
('/1/1/',           3),
('/1/2/',           5),
('/1/1/1/',         6)

declare @lvl int = 1

-- Move rows from @S to @T for each level
while exists(select *
             from @S
             where hierarchy_id.GetLevel() = @lvl)
begin

  insert into @T
  select T.hierarchy_id.GetDescendant(C.MaxID, null),
         S.node_id
  from (select S1.node_id, S2.node_id as ParentID
        from @S as S1
          inner join @S as S2
            on S1.hierarchy_id.GetAncestor(1) = S2.hierarchy_id
        where S1.hierarchy_id.GetLevel() = @lvl and
              S1.node_id not in (select node_id
                                 from @T)
       ) as S
    inner join @T as T
      on S.ParentID = T.node_id
    outer apply (select max(hierarchy_id) as MaxID
                 from @T as T2
                 where T.hierarchy_id = T2.hierarchy_id.GetAncestor(1)) as C       

    set @lvl = @lvl + 1
end

select *, hierarchy_id.ToString()
from @T
where hierarchy_id <> hierarchyid::GetRoot()

结果：

hierarchy_id  node_id  (No column name)
------------  -------  ----------------
0x58          1        /1/
0x5AC0        2        /1/1/
0x5B40        3        /1/2/
0x5B56        6        /1/2/1/
0x5BC0        4        /1/3/
0x5C20        5        /1/4/

sql - 如何在 SQL Server 中有效地合并两个层次结构？

3 回答 3

Related

Reference