我正在使用 PostgreSQL 9.1 来查询分层树结构数据,这些数据由连接到节点的边(或元素)组成。这些数据实际上是针对流网络的,但我已将问题抽象为简单的数据类型。考虑示例tree
表。每条边都有长度和面积属性,用于从网络中确定一些有用的指标。
CREATE TEMP TABLE tree (
edge text PRIMARY KEY,
from_node integer UNIQUE NOT NULL, -- can also act as PK
to_node integer REFERENCES tree (from_node),
mode character varying(5), -- redundant, but illustrative
length numeric NOT NULL,
area numeric NOT NULL,
fwd_path text[], -- optional ordered sequence, useful for debugging
fwd_search_depth integer,
fwd_length numeric,
rev_path text[], -- optional unordered set, useful for debugging
rev_search_depth integer,
rev_length numeric,
rev_area numeric
);
CREATE INDEX ON tree (to_node);
INSERT INTO tree(edge, from_node, to_node, mode, length, area) VALUES
('A', 1, 4, 'start', 1.1, 0.9),
('B', 2, 4, 'start', 1.2, 1.3),
('C', 3, 5, 'start', 1.8, 2.4),
('D', 4, 5, NULL, 1.2, 1.3),
('E', 5, NULL, 'end', 1.1, 0.9);
如下图所示,其中由 AE 表示的边与节点 1-5 相连。NULL to_node
(Ø) 表示结束节点。总是独一无二的from_node
,所以它可以充当PK。如果这个网络像流域一样流动,它将是从上到下的,其中起始支流边缘是 A、B、C,结束流出边缘是 E。
文档为WITH
如何在递归查询中使用搜索图提供了一个很好的示例。因此,要获取“转发”信息,查询从末尾开始,并向后工作:
WITH RECURSIVE search_graph AS (
-- Begin at ending nodes
SELECT E.from_node, 1 AS search_depth, E.length
, ARRAY[E.edge] AS path -- optional
FROM tree E WHERE E.to_node IS NULL
UNION ALL
-- Accumulate each edge, working backwards (upstream)
SELECT o.from_node, sg.search_depth + 1, sg.length + o.length
, o.edge|| sg.path -- optional
FROM tree o, search_graph sg
WHERE o.to_node = sg.from_node
)
UPDATE tree SET
fwd_path = sg.path,
fwd_search_depth = sg.search_depth,
fwd_length = sg.length
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;
SELECT edge, from_node, to_node, fwd_path, fwd_search_depth, fwd_length
FROM tree ORDER BY edge;
edge | from_node | to_node | fwd_path | fwd_search_depth | fwd_length
------+-----------+---------+----------+------------------+------------
A | 1 | 4 | {A,D,E} | 3 | 3.4
B | 2 | 4 | {B,D,E} | 3 | 3.5
C | 3 | 5 | {C,E} | 2 | 2.9
D | 4 | 5 | {D,E} | 2 | 2.3
E | 5 | | {E} | 1 | 1.1
以上是有道理的,并且适用于大型网络。例如,我可以看到B
边缘距离末端有 3 条边,而前向路径{B,D,E}
从末端到末端的总长度为 3.5。
但是,我想不出构建反向查询的好方法。也就是说,从每条边来看,累积的“上游”边、长度和面积是多少。使用WITH RECURSIVE
,我拥有的最好的是:
WITH RECURSIVE search_graph AS (
-- Begin at starting nodes
SELECT S.from_node, S.to_node, 1 AS search_depth, S.length, S.area
, ARRAY[S.edge] AS path -- optional
FROM tree S WHERE from_node IN (
-- Starting nodes have a from_node without any to_node
SELECT from_node FROM tree EXCEPT SELECT to_node FROM tree)
UNION ALL
-- Accumulate edges, working forwards
SELECT c.from_node, c.to_node, sg.search_depth + 1, sg.length + c.length, sg.area + c.area
, c.edge || sg.path -- optional
FROM tree c, search_graph sg
WHERE c.from_node = sg.to_node
)
UPDATE tree SET
rev_path = sg.path,
rev_search_depth = sg.search_depth,
rev_length = sg.length,
rev_area = sg.area
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;
SELECT edge, from_node, to_node, rev_path, rev_search_depth, rev_length, rev_area
FROM tree ORDER BY edge;
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+----------+------------------+------------+----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {D,A} | 2 | 2.3 | 2.2
E | 5 | | {E,C} | 2 | 2.9 | 3.3
我想在递归查询的第二项中构建聚合,因为每个下游边缘都连接到 1 个或许多上游边缘,但是递归查询不允许聚合。另外,我知道连接是草率的,因为with recursive
结果有多个连接条件edge
。
反向/反向查询的预期结果是:
edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+-------------+------------------+------------+----------
A | 1 | 4 | {A} | 1 | 1.1 | 0.9
B | 2 | 4 | {B} | 1 | 1.2 | 1.3
C | 3 | 5 | {C} | 1 | 1.8 | 2.4
D | 4 | 5 | {A,B,D} | 3 | 3.5 | 3.5
E | 5 | | {A,B,C,D,E} | 5 | 6.4 | 6.8
如何编写此更新查询?
请注意,我最终更关心的是累积准确的长度和面积总数,并且路径属性用于调试。在我的实际案例中,前向路径多达几百条,我预计大型复杂流域的反向路径有数万条。