sql - 通过查询连接

Question

我将分层数据存储在表中。当通过其分层路径（grantParent/parent/resource）访问资源时，我需要使用 CONNECT BY 查询来定位资源。

注意：SQL 命令是从 EnterpriseDB 导出的，但它也应该在 Oracle 中工作。

表结构：

CREATE TABLE resource_hierarchy
(
  resource_id character varying(100) NOT NULL,
  resource_type integer NOT NULL,
  resource_name character varying(100),
  parent_id character varying(100)
)
WITH (
  OIDS=FALSE
);

数据：

INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('36d27991', 3, 'areaName',    'a616f392');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('a616f392', 3, 'townName',    'fcc1ebb7');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('fcc1ebb7', 2, 'stateName',   '8369cc88');
INSERT INTO "resource_hierarchy" (resource_id,resource_type,resource_name,parent_id) VALUES ('8369cc88', 5, 'countryName', null);

现在，当我收到类似的路径时

countryName/stateName/townName/areaName

我正在执行一个查询，例如，

select LEVEL,* from resource_hierarchy
WHERE resource_name = (
            CASE LEVEL 
                WHEN 1 THEN 'areaName'
                WHEN 2 THEN 'townName'
                WHEN 3 THEN 'stateName'
                WHEN 4 THEN 'countryName'
                ELSE ''
            END
         )
 connect by prior parent_id = resource_id
 start with resource_name = 'areaName';

我的预期结果是：

LEVEL   resource_id resource_type   resource_name   parent_id
-------------------------------------------------------------
1       36d27991    3               areaName        a616f392
2       a616f392    3               townName        fcc1ebb7
3       fcc1ebb7    2               stateName       8369cc88
4       8369cc88    5               countryName     <null>

这个查询工作正常，但我不确定它是否会运行得更快，当我的表很大时，有数十万个条目。

你能根据我的要求优化这个查询吗？

编辑：

对上述查询的解释：我定义了两个索引 - 一个在 resource_id（主键）上，另一个在 parent_id

Sort  (cost=66.85..66.86 rows=1 width=694)
  Sort Key: connectby_cte.siblingssortcol
  CTE prior
    ->  Recursive Union  (cost=0.00..65.83 rows=31 width=151)
      ->  WindowAgg  (cost=0.00..3.12 rows=1 width=83)
        ->  Seq Scan on resource_hierarchy  (cost=0.00..3.11 rows=1 width=83)
              Filter: ((resource_name)::text = 'areaName'::text)
      ->  WindowAgg  (cost=0.33..6.21 rows=3 width=151)
        ->  Hash Join  (cost=0.33..6.15 rows=3 width=151)
              Hash Cond: ((resource_hierarchy_1.resource_id)::text = (prior.parent_id)::text)
              Join Filter: connectby_cyclecheck(prior.recursionpath, (resource_hierarchy_1.parent_id)::text)
              ->  Seq Scan on resource_hierarchy resource_hierarchy_1  (cost=0.00..2.89 rows=89 width=83)
              ->  Hash  (cost=0.20..0.20 rows=10 width=286)
                ->  WorkTable Scan on prior  (cost=0.00..0.20 rows=10 width=286)
  ->  CTE Scan on prior connectby_cte  (cost=0.00..1.01 rows=1 width=694)
    Filter: ((resource_name)::text = CASE level WHEN 1 THEN 'areaName'::text WHEN 2 THEN 'townName'::text WHEN 3 THEN 'stateName'::text WHEN 4 THEN 'countryName'::text ELSE ''::text END)

score 3 · Accepted Answer

免责声明：我的主要经验属于 Oracle DBMS，因此如果将解决方案应用于 Postgres，请注意细节。

Where子句在已构建完整层次结构后应用，因此在原始查询中，数据库引擎开始检索resource_name在任何级别指定的数据并为每个找到的记录构建完整树。过滤仅在下一步发生。
文档：

Oracle 选择层次结构的根行——那些满足 START WITH 条件的行。

Oracle 选择每个根行的子行。每个子行必须满足关于其中一个根行的 CONNECT BY 条件的条件。

Oracle 选择连续几代的子行。Oracle 首先选择步骤 2 中返回的行的子代，然后选择这些子代的子代，以此类推。Oracle 总是通过评估与当前父行相关的 CONNECT BY 条件来选择子行。

如果查询包含没有连接的 WHERE 子句，则 Oracle 会从层次结构中删除所有不满足 WHERE 子句条件的行。Oracle 对每一行单独评估此条件，而不是删除不满足条件的行的所有子行。

为了优化这种情况，必须将查询更改如下（层次结构反转为更自然的自上而下顺序）：

select 
  level, rh.* 
from 
  resource_hierarchy rh
start with 
  (resource_name = 'countryName')
  and 
  (parent_id is null) -- roots only
connect by 
  prior resource_id = parent_id
  and          
  -- at each step get only required records
  resource_name = (
    case level 
      when 1 then 'countryName'
      when 2 then 'stateName'
      when 3 then 'townName'
      when 4 then 'areaName'
      else null
    end
  )

可以基于 CTE 语法（Oracle 递归子查询分解）编写相同的查询。
以下是PostgreSQL CTE的变体，根据@Karthik_Murugan 的建议进行了更正：

with RECURSIVE hierarchy_query(lvl, resource_id) as (
    select
      1               lvl, 
      rh.resource_id  resource_id
    from
      resource_hierarchy rh
    where
     (resource_name = 'countryName') and (parent_id is null) 

  union all

    select
      hq.lvl+1        lvl,
      rh.resource_id  resource_id
    from
      hierarchy_query    hq,
      resource_hierarchy rh
    where
      rh.parent_id = hq.resource_id
      and
      -- at each step get only required records
      resource_name = (
        case (hq.lvl + 1)
          when 2 then 'stateName'
          when 3 then 'townName'
          when 4 then 'areaName'
          else null
        end
      )
)
select
  hq.lvl, rh.*
from
  hierarchy_query    hq,
  resource_hierarchy rh
where
  rh.resource_id = hq.resource_id
order by
  hq.lvl

这只是工作的一半，因为我们需要帮助数据库引擎通过创建适当的索引来定位记录。
上面的查询包含两个搜索动作：
1. 找到要开始的记录；
2. 选择每个下一个级别的记录。

对于第一个动作，我们需要索引resource_name字段和可能的parent_id字段。
对于第二个动作字段parent_id并且resource_name必须被索引。

create index X_RESOURCE_HIERARCHY_ROOT on RESOURCE_HIERARCHY (resource_name);
create index X_RESOURCE_HIERARCHY_TREE on RESOURCE_HIERARCHY (parent_id, resource_name);

也许只创建X_RESOURCE_HIERARCHY_TREE索引就足够了。它取决于存储在表中的数据的特性。

每个级别的 PS 字符串可以使用substr和instr函数从完整路径构造，如 Oracle 示例中的函数：

with prm as (
  select 
    '/countryName/stateName/townName/areaName/' location_path 
  from dual
)
select 
  substr(location_path,
    instr(location_path,'/',1,level)+1,
    instr(location_path,'/',1,level+1)-instr(location_path,'/',1,level)-1
  )          
from prm connect by level < 7

score 1 · Accepted Answer

与@ThinkJet 提出的查询略有不同。这在 EDB 中有效并给出了预期的结果。

WITH RECURSIVE rh (resource_id, resource_name, parent_id, level) AS 
(   
    SELECT resource_id, resource_name, parent_id, 1 as level FROM resource_hierarchy
    where resource_name = 'countryName' AND parent_id IS NULL
    UNION ALL
    SELECT cur.resource_id, cur.resource_name, cur.parent_id, level+1 FROM resource_hierarchy cur, rh prev WHERE cur.parent_id = prev.resource_id AND 
        cur.resource_name = (
                    CASE level 
                    WHEN 3 THEN 'areaName'
                    WHEN 2 THEN 'townName'
                    WHEN 1 THEN 'stateName'
                    END
                 )
)
SELECT * FROM rh

编辑：此查询甚至可能匹配部分匹配，但我们始终可以确保记录数 = URL 元素数。此外，如果 URL 只有一个元素（如 /countryName），请从上述查询中删除 UNION 部分以获得预期结果。

score 1 · Accepted Answer

select 
     LEVEL, 
     resource_id, 
     resource_type, 
     resource_name, 
     parent_id 
from   
     resource_hierarchy 
connect by prior parent_id = resource_id 
start with UPPER(resource_name)= UPPER(:resource_name);

使用这种方法，您将不必使用 CASE 语句。只需提及资源名称即可获取父层次结构。

sql - 通过查询连接

3 回答 3

Related

Reference