我有一个问题,我可以解决小型数据集,但在具有(可能)不干净数据的大型数据集上失败。
该数据库是 PostgreSQL 中非循环(希望)图的实现。三张桌子
vertex_elements: id
edges: id, parent_id, child_id
element_associations: id, user_id, object_id (both are vertex elements, but it unconnected graphs)
我有一组user_ids从中派生element_associations和图中的起始vertex_element,并且我想找到所有子节点都可以从具有user_ids之一的element_association访问。如果节点或其祖先之一是element_association的候选object_id之一,则认为该节点是可访问的。
该图在形状上是相对三角形的(根节点少,叶节点多),从一个起始顶点元素开始,我的策略如下:
- 根据候选element_associations列表检查当前的vertext_element;如果好,所有后代都可以访问,否则去...
- 检查当前vertex_element的祖先是否在候选element_associations列表中。与 (1) 类似,如果命中,则所有祖先都可访问,否则转到...
- 遍历每个子vertex_element(广度优先搜索)并执行步骤 1 和 2。
当我想避免仔细检查相同的祖先vertex_elements时,就会出现问题。主要查询是向下遍历,用一组候选element_associations检查每个后代的可访问性
WITH RECURSIVE edges_recursive(child_id, parent_id, matching_element_association_id) AS (
(
SELECT e1.child_id, e1.parent_id, ea.id
FROM edges e1
LEFT OUTER JOIN element_associations ea ON e1.child_id = ea.object_id
AND ea.id IN (?)
WHERE parent_id = ?
)
UNION
(
SELECT e2.child_id, e2.parent_id, ea.id
FROM edges e2
INNER JOIN assignments_recursive
ON edges_recursive.child_id = e2.parent_id
LEFT OUTER JOIN element_associations ea
ON edges_recursive.child_id = ea.object_id
AND ea.id IN (?)
WHERE edges_recursive.matching_element_association_id IS NULL
)
)
SELECT edges_recursive.child_id
FROM edges_recursive
WHERE edges_recursive.matching_element_association_id IS NOT NULL
但是,还有一个附加递归子查询,它检查LEFT OUTER JOIN element_associations 中的每个 vertex_element,看起来像
ea.id IN (
WITH RECURSIVE parent_edges_recursive(child_id, parent_id, matching_element_association_id) AS (
(
SELECT edges.child_id, edges.parent_id, ea.id
FROM edges
LEFT OUTER JOIN element_associations ea
ON ea.id IN (?) AND edges.parent_id = ea.object_id
WHERE edges.child_id = e1.parent_id AND edges.parent_id != e1.parent_id
)
UNION
(
SELECT edges.child_id, edges.parent_id. ea.id
FROM edges
JOIN parent_edges_recursive
ON parent_edges_recursive.parent_id = edges.child_id
LEFT OUTER JOIN element_associations ea
ON ea.id IN (?) AND edges.parent_id = ea.object_id
WHERE parent_edges_recursive.matching_element_association_id IS NULL
)
SELECT parent_edges_recursive.matching_element_association_id
FROM parent_edges_recursive
WHERE parent_edges_recursive.matching_element_association_id IS NOT NULL
LIMIT 1
)
)
这样做的问题是子查询倾向于避免遍历同一个父顶点两次。但是,不能保证当我们通过后代遍历图时,我们不会重读先前评估的祖先。对于小数据集,这很好,性能还可以;然而,它是可笑的不可扩展的,并且对周期极不适应。
我需要做的是保留关于我已经在子查询之间遍历的父顶点元素的信息,以避免重读步骤;但是,我被困在如何在单个查询中执行此操作。