postgresql - 如何使用递归查询向后遍历分层树结构结构

Question

我正在使用 PostgreSQL 9.1 来查询分层树结构数据，这些数据由连接到节点的边（或元素）组成。这些数据实际上是针对流网络的，但我已将问题抽象为简单的数据类型。考虑示例tree表。每条边都有长度和面积属性，用于从网络中确定一些有用的指标。

CREATE TEMP TABLE tree (
  edge text PRIMARY KEY,
  from_node integer UNIQUE NOT NULL, -- can also act as PK
  to_node integer REFERENCES tree (from_node),
  mode character varying(5), -- redundant, but illustrative
  length numeric NOT NULL,
  area numeric NOT NULL,
  fwd_path text[], -- optional ordered sequence, useful for debugging
  fwd_search_depth integer,
  fwd_length numeric,
  rev_path text[], -- optional unordered set, useful for debugging
  rev_search_depth integer,
  rev_length numeric,
  rev_area numeric
);
CREATE INDEX ON tree (to_node);
INSERT INTO tree(edge, from_node, to_node, mode, length, area) VALUES
 ('A', 1, 4, 'start', 1.1, 0.9),
 ('B', 2, 4, 'start', 1.2, 1.3),
 ('C', 3, 5, 'start', 1.8, 2.4),
 ('D', 4, 5, NULL, 1.2, 1.3),
 ('E', 5, NULL, 'end', 1.1, 0.9);

如下图所示，其中由 AE 表示的边与节点 1-5 相连。NULL to_node(Ø) 表示结束节点。总是独一无二的from_node，所以它可以充当PK。如果这个网络像流域一样流动，它将是从上到下的，其中起始支流边缘是 A、B、C，结束流出边缘是 E。

文档为WITH如何在递归查询中使用搜索图提供了一个很好的示例。因此，要获取“转发”信息，查询从末尾开始，并向后工作：

WITH RECURSIVE search_graph AS (
  -- Begin at ending nodes
  SELECT E.from_node, 1 AS search_depth, E.length
    , ARRAY[E.edge] AS path -- optional
  FROM tree E WHERE E.to_node IS NULL

  UNION ALL
  -- Accumulate each edge, working backwards (upstream)
  SELECT o.from_node, sg.search_depth + 1, sg.length + o.length
    , o.edge|| sg.path -- optional
  FROM tree o, search_graph sg
  WHERE o.to_node = sg.from_node
)
UPDATE tree SET
  fwd_path = sg.path,
  fwd_search_depth = sg.search_depth,
  fwd_length = sg.length
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;

SELECT edge, from_node, to_node, fwd_path, fwd_search_depth, fwd_length
FROM tree ORDER BY edge;

 edge | from_node | to_node | fwd_path | fwd_search_depth | fwd_length
------+-----------+---------+----------+------------------+------------
 A    |         1 |       4 | {A,D,E}  |                3 |        3.4
 B    |         2 |       4 | {B,D,E}  |                3 |        3.5
 C    |         3 |       5 | {C,E}    |                2 |        2.9
 D    |         4 |       5 | {D,E}    |                2 |        2.3
 E    |         5 |         | {E}      |                1 |        1.1

以上是有道理的，并且适用于大型网络。例如，我可以看到B边缘距离末端有 3 条边，而前向路径{B,D,E}从末端到末端的总长度为 3.5。

但是，我想不出构建反向查询的好方法。也就是说，从每条边来看，累积的“上游”边、长度和面积是多少。使用WITH RECURSIVE，我拥有的最好的是：

WITH RECURSIVE search_graph AS (
  -- Begin at starting nodes
  SELECT S.from_node, S.to_node, 1 AS search_depth, S.length, S.area
    , ARRAY[S.edge] AS path -- optional
  FROM tree S WHERE from_node IN (
    -- Starting nodes have a from_node without any to_node
    SELECT from_node FROM tree EXCEPT SELECT to_node FROM tree)

  UNION ALL
  -- Accumulate edges, working forwards
  SELECT c.from_node, c.to_node, sg.search_depth + 1, sg.length + c.length, sg.area + c.area
         , c.edge || sg.path -- optional
  FROM tree c, search_graph sg
  WHERE c.from_node = sg.to_node
)
UPDATE tree SET
  rev_path = sg.path,
  rev_search_depth = sg.search_depth,
  rev_length = sg.length,
  rev_area = sg.area
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;

SELECT edge, from_node, to_node, rev_path, rev_search_depth, rev_length, rev_area
FROM tree ORDER BY edge;

 edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+----------+------------------+------------+----------
 A    |         1 |       4 | {A}      |                1 |        1.1 |      0.9
 B    |         2 |       4 | {B}      |                1 |        1.2 |      1.3
 C    |         3 |       5 | {C}      |                1 |        1.8 |      2.4
 D    |         4 |       5 | {D,A}    |                2 |        2.3 |      2.2
 E    |         5 |         | {E,C}    |                2 |        2.9 |      3.3

我想在递归查询的第二项中构建聚合，因为每个下游边缘都连接到 1 个或许多上游边缘，但是递归查询不允许聚合。另外，我知道连接是草率的，因为with recursive结果有多个连接条件edge。

反向/反向查询的预期结果是：

 edge | from_node | to_node |  rev_path   | rev_search_depth | rev_length | rev_area
------+-----------+---------+-------------+------------------+------------+----------
 A    |         1 |       4 | {A}         |                1 |        1.1 |      0.9
 B    |         2 |       4 | {B}         |                1 |        1.2 |      1.3
 C    |         3 |       5 | {C}         |                1 |        1.8 |      2.4
 D    |         4 |       5 | {A,B,D}     |                3 |        3.5 |      3.5
 E    |         5 |         | {A,B,C,D,E} |                5 |        6.4 |      6.8

如何编写此更新查询？

请注意，我最终更关心的是累积准确的长度和面积总数，并且路径属性用于调试。在我的实际案例中，前向路径多达几百条，我预计大型复杂流域的反向路径有数万条。

score 7 · Accepted Answer

更新 2： 我重写了原始递归查询，以便所有累积/聚合都在递归部分之外完成。它应该比此答案的先前版本表现更好。这与@a_horse_with_no_name 对类似问题的回答非常相似。

  WITH 
    RECURSIVE search_graph(edge, from_node, to_node, length, area, start_node) AS
    (
        SELECT edge, from_node, to_node, length, area, from_node AS "start_node"
        FROM tree
        UNION ALL
        SELECT o.edge, o.from_node, o.to_node, o.length, o.area, p.start_node
        FROM tree o
    JOIN search_graph p ON p.from_node = o.to_node
    )
    SELECT array_agg(edge) AS "edges"
       -- ,array_agg(from_node) AS "nodes"
          ,count(edge) AS "edge_count"
          ,sum(length) AS "length_sum"
          ,sum(area) AS "area_sum"
    FROM search_graph
    GROUP BY start_node
    ORDER BY start_node
;

结果如预期：

 start_node | edges       | edge_count | length_sum |  area_sum
------------+-------------+------------+------------+------------
  1         | {A}         |          1 |        1.1 |       0.9
  2         | {B}         |          1 |        1.2 |       1.3
  3         | {C}         |          1 |        1.8 |       2.4
  4         | {D,B,A}     |          3 |        3.5 |       3.5
  5         | {E,D,C,B,A} |          5 |        6.4 |       6.8

postgresql - 如何使用递归查询向后遍历分层树结构结构

1 回答 1

Related

Reference