sql - 在多对多关系中选择匹配子集

Question

假设我在用户和项目之间存在多对多关系：一个用户可能属于多个项目，一个项目可能有多个用户。这种关系编码在表中user_projects：

create table user_projects
(
proj_id int references projs(id) not null,
user_id int references users(id) not null,
primary key (proj_id, user_id)
);

这是我的问题：给定一组用户（user1，user2，...），我想选择给定用户集是其所有用户子集的所有项目。

例如，如果我在下面插入数据，然后询问用户 1 和 2 的所有项目，那么查询应该只返回项目 1。

insert into user_projects values (1, 1);
insert into user_projects values (1, 2);
insert into user_projects values (1, 3);
insert into user_projects values (2, 1);
insert into user_projects values (2, 3);

（如果最好的解决方案碰巧是非标准的，我正在使用 PostgreSQL。）

编辑：为澄清起见，用户集应被解释为对要返回的项目列表的约束。集合 {u1, u2} 意味着项目列表应该只包括那些至少有用户 u1 和 u2 的项目；集合 {u1} 表示应返回至少具有用户 u1 的所有项目，并且作为限制情况，空集表示应返回所有项目。

score 5 · Accepted Answer

Select project_ID 
from user_projects
where user_ID in (1,2)
group by project_ID
Having count(*) = 2

您知道您有 2 个用户，您知道他们将是唯一的（主键），因此您知道如果有 2 条记录，对于同一个项目，那么它就是您想要的。

您的问题表明您已经发送了 GIVEN 用户，因此您知道哪些用户以及有多少用户。上面的 SQL 可以更新以接受这些已知的参数，因此保持动态，不仅限于 2 个用户。

where user_ID in (userlist)
having count(*) = (cntuserList)

------------当用户集为空时处理情况-----

Select P.project_ID 
from Projects P
LEFT JOIN user_projects UP
where (UP.user_ID in (1,2) OR UP.USER_ID is null)
group by project_ID
Having count(*) = 2

这就是它的作用。它返回所有项目，如果有用户附属于该项目，它会识别它们。如果您设置包含用户，则返回的项目列表由该集合过滤，确保整个集合都在项目中，通过有子句。

如果集合为空，则左连接以及 userID 为 null 语句将保留没有列出用户的项目，无论集合是否为空。have 子句将进一步减少集合到您在集合中定义的用户数，或 0 表示返回所有未分配用户的项目。

我们还没有讨论的另一个极端情况是，如果项目包含的用户多于您在集合中定义的用户数，会发生什么。目前该项目将被退回；但我不肯定这就是你想要的。

顺便说一句，感谢您让我思考。我不再深入研究代码了。这就是为什么我时不时来这里看看我能不能帮忙！

score 2 · Accepted Answer

这是另一种解决方案，看起来更简单：

select  proj_id
from    user_projects
group by proj_id
having  array_agg ( user_id ) @> array [1, 2]

正如@Thilo 注意到的那样，可能存在没有分配用户的项目。因此，如果用户的输入集为空，则查询应返回 projs 表中的所有项目。这是改进的解决方案：

select      p.proj_id
from        projs           p
left join   user_projects   up
    on      p.proj_id = up.proj_id
group by    p.proj_id
having      array_agg ( up.user_id ) @> array (
    select  u
    from    generate_series ( 1, 2 )
    where   false   /* an empty set */
    )
;

我已经测试了一段时间的额定解决方案的性能。至于查询小型数据集（user_projects 中的 1 670 行）时没有显着差异，另一种情况是表 user_projects 有 1 667 000 行
（列 proj_id 和 user_id 已填充从 1 到 1 000 000 的随机值；一个项目中平均有 2 个用户，最多 11 个用户）：

array_agg 方法（从 projs 和 user_projects 读取）通常需要 24 秒（有时更少）才能给出结果。
Wildplasser 的方法：总是 31 秒。
Thilo 的查询时间太长，我决定取消它。
xQbert 的“计数”方法强烈依赖于索引，速度要快很多倍——几乎总是只需要 0.5 秒。但是，为了处理空的用户输入集，它需要被重写。

[测试是在不是最新的 PC 上的 Postgresql 9.2.2 上运行的，尽管在较新的 PC 上的 Postgresql 8.4 上的比例相似]。

score 2 · Accepted Answer

这种关系划分通常可以表示为SELECT FROM a WHERE NOT EXISTS ( b WHERE NOT EXISTS (c))

WITH users AS (
        SELECT generate_series (1,2)::integer AS user_id
        )
SELECT DISTINCT up.proj_id
FROM user_projects up
   -- all the projects, but
   -- NOT the ones that miss (at least) one of the users
WHERE NOT EXISTS (
        SELECT *
        FROM users us
          -- The projects that miss (at least) one of the users
        WHERE NOT EXISTS (
                SELECT *
                FROM user_projects nx
                WHERE nx.user_id = us.user_id AND nx.proj_id = up.proj_id
                )
        )
        ;

score 1 · Accepted Answer

一个更通用的答案，允许您使用相同数量的代码拥有任意用户集。首先，我们使用用户集创建一个表：

CREATE TEMP TABLE user_set ( 
  u int
);
INSERT INTO user_set VALUES (1), (2);

你可以用你可以放在FROM下面子句中的任何函数来替换这个表。

现在选择实际项目：

SELECT DISTINCT 
    proj_id 
FROM 
    user_projects 
WHERE 
    true = ALL (
        -- Select all required users and test if they are a member of the project
        SELECT u IN (
            -- Select all user ids of this project
            SELECT 
                user_id 
            FROM 
                user_projects AS up 
            WHERE 
                up.proj_id = user_projects.proj_id
        )
        FROM 
            user_set
   )

还有小提琴。

score 0 · Accepted Answer

像这样的东西应该工作：

SELECT u.proj_id
FROM user_projects u
   JOIN user_projects u2 on u.proj_id = u2.proj_id
WHERE u.user_id = 1 and u2.user_id = 2

这是小提琴。

祝你好运。

score 0 · Accepted Answer

您可以使用多个 JOIN 块，例如：

 SELECT Up1.project_id
   FROM user_projects as up1
   JOIN user_projects as up2 on up1.project_id=up2.project_id
  WHERE up1.user_id=1
    AND up2.user_id=2;

您应该为所需集合中的每个用户创建一个新的 JOIN 块。

sql - 在多对多关系中选择匹配子集

6 回答 6

Related

Reference