26

实现数组中所有元素必须匹配的子句的最简单和最快的方法是什么 - 使用时不仅仅是一个IN?毕竟它应该表现得像mongodb 的 $all

考虑到其中 conversation_users 是 conversation_id 和 user_id 之间的连接表的组对话,我想到了这样的事情:

WHERE (conversations_users.user_id ALL IN (1,2))

更新16.07.12

添加有关架构和案例的更多信息:

  1. 连接表相当简单:

                  Table "public.conversations_users"
         Column      |  Type   | Modifiers | Storage | Description 
    -----------------+---------+-----------+---------+-------------
     conversation_id | integer |           | plain   | 
     user_id         | integer |           | plain   | 
    
  2. 一个对话有很多用户,一个用户属于很多对话。为了找到对话中的所有用户,我正在使用这个连接表。

  3. 最后,我试图找出一个 ruby​​ on rails scope,它会根据参与者的情况找到我的对话 - 例如:

    scope :between, ->(*users) {
      joins(:users).where('conversations_users.user_id all in (?)', users.map(&:id))
    }
    

更新23.07.12

我的问题是要找到完全匹配的人。所以:

如果查询之间的对话(1,2,3)将不匹配(1,2)

4

9 回答 9

33

Assuming the join table follows good practice and has a unique compound key defined, i.e. a constraint to prevent duplicate rows, then something like the following simple query should do.

select conversation_id from conversations_users where user_id in (1, 2)
group by conversation_id having count(*) = 2

It's important to note that the number 2 at the end is the length of the list of user_ids. That obviously needs to change if the user_id list changes length. If you can't assume your join table doesn't contain duplicates, change "count(*)" to "count(distinct user_id)" at some possible cost in performance.

This query finds all conversations that include all the specified users even if the conversation also includes additional users.

If you want only conversations with exactly the specified set of users, one approach is to use a nested subquery in the where clause as below. Note, first and last lines are the same as the original query, only the middle two lines are new.

select conversation_id from conversations_users where user_id in (1, 2)
   and conversation_id not in
   (select conversation_id from conversations_users where user_id not in (1,2))
group by conversation_id having count(*) = 2

Equivalently, you can use a set difference operator if your database supports it. Here is an example in Oracle syntax. (For Postgres or DB2, change the keyword "minus" to "except.)

select conversation_id from conversations_users where user_id in (1, 2)
  group by conversation_id having count(*) = 2
minus
  select conversation_id from conversations_users where user_id not in (1,2)

A good query optimizer should treat the last two variations identically, but check with your particular database to be sure. For example, the Oracle 11GR2 query plan sorts the two sets of conversation ids before applying the minus operator, but skips the sort step for the last query. So either query plan could be faster depending on multiple factors such as the number of rows, cores, cache, indices etc.

于 2012-07-21T05:28:20.697 回答
7

我将这些用户折叠成一个数组。我还使用 CTE(WITH 子句中的东西)来使其更具可读性。

=> select * from conversations_users ;
 conversation_id | user_id
-----------------+---------
               1 |       1
               1 |       2
               2 |       1
               2 |       3
               3 |       1
               3 |       2
(6 rows)       

=> WITH users_on_conversation AS (
  SELECT conversation_id, array_agg(user_id) as users
  FROM conversations_users
  WHERE user_id in (1, 2) --filter here for performance                                                                                      
  GROUP BY conversation_id
)
SELECT * FROM users_on_conversation
WHERE users @> array[1, 2];
 conversation_id | users
-----------------+-------
               1 | {1,2}
               3 | {1,2}
(2 rows) 

编辑(一些资源)

于 2012-07-16T21:34:48.303 回答
4

这会保留ActiveRecord对象。

在下面的示例中,我想知道与数组中所有代码相关联的时间表。

codes = [8,9]

Timesheet.joins(:codes).select('count(*) as count, timesheets.*').
           where('codes.id': codes).
           group('timesheets.id').
           having('count(*) = ?', codes.length)

您应该有完整的ActiveRecord对象可以使用。如果您希望它是一个真正的范围,您可以使用上面的示例并使用.pluck(:id).

于 2015-05-15T17:13:48.110 回答
3

虽然@Alex' 回答INandcount()可能是最简单的解决方案,但我希望这个 PL/pgSQL 函数更快:

CREATE OR REPLACE FUNCTION f_conversations_among_users(_user_arr int[])
  RETURNS SETOF conversations AS
$BODY$
DECLARE
    _sql text := '
    SELECT c.*
    FROM   conversations c';
    i int;
BEGIN

FOREACH i IN ARRAY _user_arr LOOP
    _sql  := _sql  || '
    JOIN   conversations_users x' || i || ' USING (conversation_id)';
END LOOP;

_sql  := _sql  || '
    WHERE  TRUE';

FOREACH i IN ARRAY _user_arr LOOP
    _sql  := _sql  || '
    AND    x' || i || '.user_id = ' || i;
END LOOP;

/* uncomment for conversations with exact list of users and no more
_sql  := _sql  || '
    AND    NOT EXISTS (
        SELECT 1
        FROM   conversations_users u
        WHERE  u.conversation_id = c.conversation_id
        AND    u.user_id <> ALL (_user_arr)
        )
*/

-- RAISE NOTICE '%', _sql;
RETURN QUERY EXECUTE _sql;

END;
$BODY$ LANGUAGE plpgsql VOLATILE;

称呼:

SELECT * FROM f_conversations_among_users('{1,2}')

该函数动态构建执行以下形式的查询:

SELECT c.*
FROM   conversations c
JOIN   conversations_users x1 USING (conversation_id)
JOIN   conversations_users x2 USING (conversation_id)
...
WHERE  TRUE
AND    x1.user_id = 1
AND    x2.user_id = 2
...

这种形式在关系除法查询的广泛测试中表现最好。

您也可以在您的应用程序中构建查询,但我假设您想使用一个数组参数。此外,无论如何这可能是最快的。

任一查询都需要如下索引才能快速:

CREATE INDEX conversations_users_user_id_idx ON conversations_users (user_id);

一个多列的主(或唯一)键(user_id, conversation_id)也是一样的,但是一个(conversation_id, user_id)(就像你可能拥有的那样!)会很差。您可以在上面的链接中找到一个简短的理由,或者在 dba.SE 上的这个相关问题下找到一个综合评估

我还假设您在conversations.conversation_id.

您可以使用@Alex' 查询和此功能运行性能测试EXPLAIN ANALYZE并报告您的发现吗?

请注意,这两种解决方案都可以找到至少阵列中的用户参与的对话——包括与其他用户的对话。
如果您想排除这些,请取消注释我的函数中的附加子句(或将其添加到任何其他查询中)。

如果您需要有关该功能的更多说明,请告诉我。

于 2012-07-21T12:54:10.113 回答
1
select id from conversations where not exists(
    select * from conversations_users cu 
    where cu.conversation_id=conversations.id 
    and cu.user_id not in(1,2,3)        
)

这可以很容易地制成一个轨道范围。

于 2012-07-18T12:40:52.053 回答
1

我猜你真的不想开始弄乱临时表。

您的问题不清楚您是想要与用户组进行对话,还是与超集进行对话。以下是超集:

with users as (select user_id from users where user_id in (<list>)
              ),
     conv  as (select conversation_id, user_id
               from conversations_users
               where user_id in (<list>)
              )
select distinct conversation_id
from users u left outer join
     conv c
     on u.user_id = c.user_id
where c.conversation_id is not null

为了使该查询正常工作,它假定您在用户和会话用户中都有关于用户 ID 的索引。

对于确切的集合。. .

with users as (select user_id from users where user_id in (<list>)
              ),
     conv  as (select conversation_id, user_id
               from conversations_users
               where user_id in (<list>)
              )
select distinct conversation_id
from users u full outer join
     conv c
     on u.user_id = c.user_id
where c.conversation_id is not null and u.user_id is not null
于 2012-07-20T01:23:21.370 回答
1

创建一个包含所有可能值的映射表并使用它

select 
    t1.col from conversations_users as t1 
    inner join mapping_table as map on t1.user_id=map.user_id
group by 
    t1.col  
having  
    count(distinct conversations_users.user_id)=
    (select count(distinct user_id) from mapping)
于 2012-07-13T10:40:42.733 回答
1

根据@Alex Blakemore 的回答,您Conversation班级的等效 Rails 4 范围将是:

# Conversations exactly with users array
scope :by_users, -> (users) { 
                           self.by_any_of_users(users)
                             .group("conversations.id")
                             .having("COUNT(*) = ?", users.length) -
                           joins(:conversations_users)
                             .where("conversations_users.user_id NOT IN (?)", users)
}
# generates an IN clause
scope :by_any_of_users, -> (users) { joins(:conversations_users).where(conversations_users: { user_id: users }).distinct }

请注意,您可以优化它而不是执行 Rails -(减号),您可以执行 a.where("NOT IN")但这会非常难以阅读。

于 2016-06-24T20:26:05.377 回答
0

基于 Alex Blakemore 的回答

select conversation_id
from conversations_users cu
where user_id in (1, 2)
group by conversation_id 
having count(distinct user_id) = 2

我找到了一个具有相同目标的替代查询,找到包含 user_1 和 user_2 的对话的对话 ID(忽略其他用户)

select *
from conversations_users cu1
where 2 = (
    select count(distinct user_id)
    from conversations_users cu2
    where user_id in (1, 2) and cu1.conversation_id = cu2.conversation_id
)

根据 postgres 通过解释查询语句执行的分析,它比较慢,我猜这是真的,因为有更多的条件被评估,至少,conversations_users子查询的每一行都会被执行,因为它是相关的子查询。此查询的积极点是您没有分组,因此您可以选择 conversations_users 表的附加字段。在某些情况下(比如我的)它可能很方便。

于 2019-10-31T21:11:04.473 回答