我正在开发的应用程序有一个名为folders
. 文件夹可能包含其他文件夹,归用户所有,并且可以与其他用户共享。
该功能已经实现并且一直运行良好,直到最近某个特定的 API 端点开始表现不佳。此 API 端点返回调用用户拥有的所有文件夹。此外,对于每个文件夹,将返回有权访问该文件夹的用户列表(拥有用户将是其中之一)。
文件夹表的架构如下所示:
id
title
parent_id
path (of ltree type)
包含共享位的表如下所示,folder_visibility 表:
id
folder_id
profile_id
给定 profile_id 对文件夹的访问由 2 个规则决定:
(folder_id, profile_id)
folder_visibility 表中存在一条记录,- 一个文件夹是另一个文件夹的后代,其中存在
(folder_id, profile_id)
记录。
或者,第一条和第二条规则可以这样表达:如果文件夹与用户共享,则用户可以看到该文件夹及其所有后代。
问题是有效地实施规则 2 对我来说似乎很难。
例子。以下是我的思考过程,以及测试数据:
create table folders (id uuid, title text, path ltree);
create table folder_visibility (id uuid, profile_id uuid, folder_id uuid);
insert into folders (id, title, path) values ('9ca5ed28-83c3-45c3-98a1-31ebf0466e63', 'Folder 1', '9ca5ed28_83c3_45c3_98a1_31ebf0466e63');
insert into folders (id, title, path) values ('80ba97ae-b21b-49d7-aedb-3004cdee0c21', 'Folder 2', '9ca5ed28_83c3_45c3_98a1_31ebf0466e63.80ba97ae_b21b_49d7_aedb_3004cdee0c21');
insert into folders (id, title, path) values ('a7abb686-2a69-44d1-b2cd-60825478c4c8', 'Folder 3', '9ca5ed28_83c3_45c3_98a1_31ebf0466e63.80ba97ae_b21b_49d7_aedb_3004cdee0c21.a7abb686_2a69_44d1_b2cd_60825478c4c8');
insert into folder_visibility (id, profile_id, folder_id) values (gen_random_uuid(), 'e63533e4-eb10-48f6-ae69-ddf5f9242510', '80ba97ae-b21b-49d7-aedb-3004cdee0c21');
insert into folder_visibility (id, profile_id, folder_id) values (gen_random_uuid(), '7cbb5f24-61d0-430e-ab10-225554b0725b', '9ca5ed28-83c3-45c3-98a1-31ebf0466e63');
insert into profiles values ('7cbb5f24-61d0-430e-ab10-225554b0725b', 'owner@server.com');
insert into profiles values ('e63533e4-eb10-48f6-ae69-ddf5f9242510', 'user@server.com');
create index on folder_visibility (profile_id);
create index on folder_visibility (folder_id);
create index on folders (folder_id);
create index on folders (id);
create index on profiles (id);
我获取给定配置文件的所有文件夹,这相对容易:
select f2.id folder_id, p.id profile_id, p.email from folders f1 join folders f2 on f2.path <@ f1.path join folder_visibility fv1 on fv1.folder_id = f1.id join profiles p on p.id = fv1.profile_id where fv1.profile_id = '7cbb5f24-61d0-430e-ab10-225554b0725b'; folder_id | profile_id | email --------------------------------------+--------------------------------------+------------------ 9ca5ed28-83c3-45c3-98a1-31ebf0466e63 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com 80ba97ae-b21b-49d7-aedb-3004cdee0c21 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com a7abb686-2a69-44d1-b2cd-60825478c4c8 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com (3 rows)
从这里,我从上面完成规则 1,如下所示:
select f2.id folder_id, p.id profile_id, p.email from folders f1 join folders f2 on f2.path <@ f1.path join folder_visibility fv1 on fv1.folder_id = f1.id join folder_visibility fv2 on (fv2.folder_id = f1.id or fv2.folder_id = f2.id) join profiles p on p.id = fv2.profile_id where fv1.profile_id = '7cbb5f24-61d0-430e-ab10-225554b0725b'; folder_id | profile_id | email --------------------------------------+--------------------------------------+------------------ 9ca5ed28-83c3-45c3-98a1-31ebf0466e63 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com 80ba97ae-b21b-49d7-aedb-3004cdee0c21 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com a7abb686-2a69-44d1-b2cd-60825478c4c8 | 7cbb5f24-61d0-430e-ab10-225554b0725b | owner@server.com 80ba97ae-b21b-49d7-aedb-3004cdee0c21 | e63533e4-eb10-48f6-ae69-ddf5f9242510 | user@server.com (4 rows)
为了实现规则 2,我天真地添加了另一组连接,最后得到了必要行的列表:
select distinct on (f3.id, p.email) f3.id folder_id, p.email, fv3.profile_id from folders f1 join folders f2 on f2.path operator(public.<@) f1.path join folder_visibility fv1 on fv1.folder_id = f1.id join folder_visibility fv2 on (fv2.folder_id = f1.id or fv2.folder_id = f2.id) join folders f3 on f3.path operator(public.<@) f2.path join folder_visibility fv3 on (fv3.folder_id = f2.id or fv3.folder_id = f3.id) join profiles p on p.id = fv3.profile_id where fv1.profile_id = '7cbb5f24-61d0-430e-ab10-225554b0725b'; folder_id | email | profile_id --------------------------------------+------------------+-------------------------------------- 80ba97ae-b21b-49d7-aedb-3004cdee0c21 | owner@server.com | 7cbb5f24-61d0-430e-ab10-225554b0725b 80ba97ae-b21b-49d7-aedb-3004cdee0c21 | user@server.com | e63533e4-eb10-48f6-ae69-ddf5f9242510 9ca5ed28-83c3-45c3-98a1-31ebf0466e63 | owner@server.com | 7cbb5f24-61d0-430e-ab10-225554b0725b a7abb686-2a69-44d1-b2cd-60825478c4c8 | owner@server.com | 7cbb5f24-61d0-430e-ab10-225554b0725b a7abb686-2a69-44d1-b2cd-60825478c4c8 | user@server.com | e63533e4-eb10-48f6-ae69-ddf5f9242510 (5 rows)
我的示例中最后一步的问题是执行计划变得非常庞大:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=7.38..7.39 rows=1 width=64) (actual time=0.103..0.111 rows=5 loops=1)
-> Sort (cost=7.38..7.39 rows=1 width=64) (actual time=0.103..0.106 rows=8 loops=1)
Sort Key: f3.id, p.email
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.00..7.37 rows=1 width=64) (actual time=0.031..0.087 rows=8 loops=1)
Join Filter: (fv3.profile_id = p.id)
Rows Removed by Join Filter: 8
-> Nested Loop (cost=0.00..6.33 rows=1 width=32) (actual time=0.028..0.066 rows=8 loops=1)
Join Filter: ((fv3.folder_id = f2.id) OR (fv3.folder_id = f3.id))
Rows Removed by Join Filter: 8
-> Nested Loop (cost=0.00..5.28 rows=1 width=32) (actual time=0.025..0.046 rows=8 loops=1)
Join Filter: (f3.path OPERATOR(public.<@) f2.path)
Rows Removed by Join Filter: 4
-> Nested Loop (cost=0.00..4.21 rows=1 width=48) (actual time=0.022..0.032 rows=4 loops=1)
Join Filter: ((fv2.folder_id = f1.id) OR (fv2.folder_id = f2.id))
Rows Removed by Join Filter: 2
-> Nested Loop (cost=0.00..3.16 rows=1 width=64) (actual time=0.018..0.022 rows=3 loops=1)
Join Filter: (f2.path OPERATOR(public.<@) f1.path)
-> Nested Loop (cost=0.00..2.09 rows=1 width=48) (actual time=0.014..0.016 rows=1 loops=1)
Join Filter: (f1.id = fv1.folder_id)
Rows Removed by Join Filter: 2
-> Seq Scan on folder_visibility fv1 (cost=0.00..1.02 rows=1 width=16) (actual time=0.009..0.010 rows=1 loops=1)
Filter: (profile_id = '7cbb5f24-61d0-430e-ab10-225554b0725b'::uuid)
Rows Removed by Filter: 1
-> Seq Scan on folders f1 (cost=0.00..1.03 rows=3 width=48) (actual time=0.002..0.003 rows=3 loops=1)
-> Seq Scan on folders f2 (cost=0.00..1.03 rows=3 width=48) (actual time=0.002..0.002 rows=3 loops=1)
-> Seq Scan on folder_visibility fv2 (cost=0.00..1.02 rows=2 width=16) (actual time=0.001..0.001 rows=2 loops=3)
-> Seq Scan on folders f3 (cost=0.00..1.03 rows=3 width=48) (actual time=0.001..0.001 rows=3 loops=4)
-> Seq Scan on folder_visibility fv3 (cost=0.00..1.02 rows=2 width=32) (actual time=0.001..0.001 rows=2 loops=8)
-> Seq Scan on profiles p (cost=0.00..1.02 rows=2 width=48) (actual time=0.001..0.001 rows=2 loops=8)
Planning Time: 0.588 ms
Execution Time: 0.151 ms
(32 rows)
在大型数据集(给定配置文件的数千个文件夹)上,此查询会执行几秒钟。根据上述两条规则,获取所有文件夹以及每个文件夹的所有“可见性”的更有效方法是什么?