我不是 SQL 专家,但如果有人可以帮助我。
我使用递归 CTE 来获取如下值。
子 1 --> 父 1
父 1 --> 父 2
父 2 --> NULL
如果数据填充出错了,那么我会有类似下面的内容,因此 CTE 可能会进入无限递归循环并给出最大递归错误。由于数据量很大,我无法手动检查这些不良数据。请让我知道是否有办法找到它。
子 1 --> 父 1
父 1 --> 子 1
或者
子 1 --> 父 1
父 1 --> 父 2
父 2 --> 子 1
我不是 SQL 专家,但如果有人可以帮助我。
我使用递归 CTE 来获取如下值。
子 1 --> 父 1
父 1 --> 父 2
父 2 --> NULL
如果数据填充出错了,那么我会有类似下面的内容,因此 CTE 可能会进入无限递归循环并给出最大递归错误。由于数据量很大,我无法手动检查这些不良数据。请让我知道是否有办法找到它。
子 1 --> 父 1
父 1 --> 子 1
或者
子 1 --> 父 1
父 1 --> 父 2
父 2 --> 子 1
使用 Postgres,很容易通过将所有访问过的节点收集到一个数组中来防止这种情况发生。
设置:
create table hierarchy (id integer, parent_id integer);
insert into hierarchy
values
(1, null), -- root element
(2, 1), -- first child
(3, 1), -- second child
(4, 3),
(5, 4),
(3, 5); -- endless loop
递归查询:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
)
select *
from tree;
要同时对多棵树执行此操作,您需要将根节点的 ID 传递给子节点:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents,
id as root_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id,
p.root_id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
and c.root_id = p.root_id
)
select *
from tree;
Postgres 14 引入了(符合标准的)CYCLE
选项来检测周期:
with recursive tree as (
select id,
parent_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id
from hierarchy c
join tree p
on c.parent_id = p.id
)
cycle id -- track cycles for this column
set is_cycle -- adds a boolean column is_cycle
using path -- adds a column that contains all parents for the id
select *
from tree
where not is_cycle
您还没有指定方言或列名,所以很难做出完美的例子......
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 2, 'SubChild')
-- End random data
;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
(
SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
UNION ALL
SELECT R.StartingID, R.Level + 1,
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.*
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description
FROM RecursiveCTE
ORDER BY StartingID, Level
像这样的东西将显示递归 cte 中是否/在哪里有循环。看专栏Loop
。数据保持不变,没有循环。在评论中有关于如何更改值以导致循环的示例。
最后,递归 cteVARCHAR(MAX)
以形式|id1|id2|id3|
(称为Parents
)创建一个 id,然后检查当前ID
是否已经在该“列表”中。如果是,则将该Loop
列设置为 1。此列在递归连接 ( ABD R.Loop = 0
) 中被选中。
结束查询使用 a将整个“块”链MAX() OVER (PARTITION BY ...)
的列设置为 1 。Loop
稍微复杂一点,生成一个“更好”的报告:
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 3, 'SubChild')
-- End random data
-- The "terminal" childrens (that are elements that don't have childrens
-- connected to them)
;WITH WithoutChildren AS
(
SELECT MT1.* FROM #MyTable MT1
WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
)
, RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
(
SELECT ID, -- StartingID
1, -- Level
'|' + CAST(ID AS VARCHAR(MAX)) + '|',
'|' + CAST(Description AS VARCHAR(MAX)) + '|',
0, -- Loop
ParentID
FROM WithoutChildren
UNION ALL
SELECT R.StartingID, -- StartingID
R.Level + 1, -- Level
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.ParentID
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT * FROM RecursiveCTE
WHERE ParentID IS NULL OR Loop = 1
此查询应返回所有“最后一个子”行以及完整的父链。该列Loop
是0
如果没有循环,1
如果有循环。
您可以使用 Knuth 描述的相同方法在此处检测链表中的循环。在一个列中,跟踪孩子、孩子的孩子、孩子的孩子的孩子等。在另一列中,跟踪孙辈、孙辈的孙辈、孙辈的孙辈的孙辈等。
对于初始选择,列Child
与Grandchild
列之间的距离为 1。每个选择 fromunion all
的深度增加Child
1,深度增加Grandchild
2。它们之间的距离增加 1。
如果你有任何循环,因为距离每次只增加 1,在循环中的某个点之后Child
,距离将是循环长度的倍数。发生这种情况时,Child
和Grandchild
列是相同的。使用它作为停止递归的附加条件,并在其余代码中将其检测为错误。
SQL Server 示例:
declare @LinkTable table (Parent int, Child int);
insert into @LinkTable values (1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (7, 1);
with cte as (
select lt1.Parent, lt1.Child, lt2.Child as Grandchild
from @LinkTable lt1
inner join @LinkTable lt2 on lt2.Parent = lt1.Child
union all
select cte.Parent, lt1.Child, lt3.Child as Grandchild
from cte
inner join @LinkTable lt1 on lt1.Parent = cte.Child
inner join @LinkTable lt2 on lt2.Parent = cte.Grandchild
inner join @LinkTable lt3 on lt3.Parent = lt2.Child
where cte.Child <> cte.Grandchild
)
select Parent, Child
from cte
where Child = Grandchild;
删除LinkTable
导致循环的记录之一,您会发现select
不再返回任何数据。
这是检测邻接列表(父/子关系)中的循环的另一种方法,其中节点只能有一个父级,可以通过对子列的唯一约束来强制执行(id
在下表中)。这通过递归查询计算邻接列表的闭包表来工作。它首先将每个节点作为其自己的 0 级祖先添加到闭包表中,然后迭代地遍历邻接表以扩展闭包表。当新记录的子级和祖先级在原始级别零 (0) 以外的任何级别上都相同时,就会检测到循环:
-- For PostgreSQL and MySQL 8 use the Recursive key word in the CTE code:
-- with RECURSIVE cte(ancestor, child, lev, cycle) as (
with cte(ancestor, child, lev, cycle) as (
select id, id, 0, 0 from Table1
union all
select cte.ancestor
, Table1.id
, case when cte.ancestor = Table1.id then 0 else cte.lev + 1 end
, case when cte.ancestor = Table1.id then cte.lev + 1 else 0 end
from Table1
join cte
on cte.child = Table1.PARENT_ID
where cte.cycle = 0
) -- In oracle uncomment the next line
-- cycle child set isCycle to 'Y' default 'N'
select distinct
ancestor
, child
, lev
, max(cycle) over (partition by ancestor) cycle
from cte
给定 Table1 的以下邻接列表:
| parent_id | id |
|-----------|----|
| (null) | 1 |
| (null) | 2 |
| 1 | 3 |
| 3 | 4 |
| 1 | 5 |
| 2 | 6 |
| 6 | 7 |
| 7 | 8 |
| 9 | 10 |
| 10 | 11 |
| 11 | 9 |
上述查询适用于 SQL Sever(以及按照指示修改后的 Oracle、PostgreSQL 和 MySQL 8)正确地检测到节点 9、10 和 11 参与长度为 3 的循环。
SQL(/DB) Fiddles 在各种 DB 中展示了这一点,可以在下面找到:
尝试限制递归结果
WITH EMP_CTE AS
(
SELECT
0 AS [LEVEL],
ManagerId, EmployeeId, Name
FROM Employees
WHERE ManagerId IS NULL
UNION ALL
SELECT
[LEVEL] + 1 AS [LEVEL],
ManagerId, EmployeeId, Name
FROM Employees e
INNER JOIN EMP_CTE c ON e.ManagerId = c.EmployeeId
AND s.LEVEL < 100 --RECURSION LIMIT
)
SELECT * FROM EMP_CTE WHERE [Level] = 100
这是 SQL Server 的解决方案:
表格插入脚本:
CREATE TABLE MyTable
(
[ID] INT,
[ParentID] INT,
[Name] NVARCHAR(255)
);
INSERT INTO MyTable
(
[ID],
[ParentID],
[Name]
)
VALUES
(1, NULL, 'A root'),
(2, NULL, 'Another root'),
(3, 1, 'Child of 1'),
(4, 3, 'Grandchild of 1'),
(5, 4, 'Great grandchild of 1'),
(6, 1, 'Child of 1'),
(7, 8, 'Child of 8'),
(8, 7, 'Child of 7'), -- This will cause infinite recursion
(9, 1, 'Child of 1');
查找罪魁祸首的确切记录的脚本:
;WITH RecursiveCTE
AS (
-- Get all parents:
-- Any record in MyTable table could be an Parent
-- We don't know here yet which record can involve in an infinite recursion.
SELECT ParentID AS StartID,
ID,
CAST(Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
FROM MyTable
UNION ALL
-- Recursively try finding all the childrens of above parents
-- Keep on finding it until this child become parent of above parent.
-- This will bring us back in the circle to parent record which is being
-- keep in the StartID column in recursion
SELECT RecursiveCTE.StartID,
t.ID,
CAST(RecursiveCTE.[ParentChildRelationPath] + ' -> ' + t.Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
FROM RecursiveCTE
INNER JOIN MyTable AS t
ON t.ParentID = RecursiveCTE.ID
WHERE RecursiveCTE.StartID != RecursiveCTE.ID)
-- FInd the ones which causes the infinite recursion
SELECT StartID,
[ParentChildRelationPath],
RecursiveCTE.ID
FROM RecursiveCTE
WHERE StartID = ID
OPTION (MAXRECURSION 0);
上述查询的输出: