7

I have a Messages table that looks like this:

                    Messages
+-----+------------+-------------+--------------+
|  id |  sender_id | receiver_id |  created_at  |
+-----------------------------------------------+
|  1  |      1     |      2      |   1/1/2013   |
|  2  |      1     |      2      |   1/1/2013   |
|  3  |      2     |      1      |   1/2/2013   |
|  4  |      3     |      2      |   1/2/2013   |
|  5  |      3     |      2      |   1/3/2013   |
|  6  |      5     |      4      |   1/4/2013   |
+-----------------------------------------------+

Where a 'thread' is a group of messages between a given sender_id and receiver_id I want a query to return the most recent 10 messages for the most recent 10 threads where either the sender_id or receiver_id is a given id.

Expected output where given user_id is 5:

+-----+------------+-------------+--------------+
|  id |  sender_id | receiver_id |  created_at  |
+-----------------------------------------------+
|  1  |      5     |      2      |   1/4/2013   |
|  2  |      5     |      2      |   1/4/2013   |
|  3  |      2     |      5      |   1/4/2013   |
|  4  |      3     |      5      |   1/4/2013   |
|  5  |      5     |      2      |   1/3/2013   |
|  6  |      5     |      4      |   1/3/2013   |
+-----------------------------------------------+

up to a limit of 10 messages between, for example, user 5 and 2 (above there are 4) and a limit of 10 threads (above there are 3).

I've been trying with this sort of query using a subquery but haven't managed to get the second limit on the number of distinct threads.

SELECT * FROM (SELECT DISTINCT ON (sender_id, receiver_id) messages.* 
FROM messages 
WHERE (receiver_id = 5 OR sender_id = 5) ORDER BY sender_id, receiver_id, 
created_at DESC)   
q ORDER BY created_at DESC 
LIMIT 10 OFFSET 0;

I'm considering creating a new Thread table containing a thread_id field which would be the concatenation of sender_id + receiver_id and then just joining on Messages but I have a sneaky suspicion that it should be doable with just one table.

4

5 回答 5

2

我可以想象在一个查询中解决您的问题的最整洁的查询是以下查询:

select * from (
  select row_number() 
    over (partition by sender_id, receiver_id order by created_at desc) as rn, m.*
  from Messages m
  where (m.sender_id, m.receiver_id) in (
    select sender_id, receiver_id
    from Messages
    where sender_id = <id> or receiver_id = <id>
    group by sender_id, receiver_id
    order by max(created_at) desc
    limit 10 offset 0
  )
) res where res.rn <= 10

row_number() over (partition by sender_id, receiver_id order by created_at desc)列将包含每个线程中每条消息的行号(如果您运行单独的查询以仅查询一个线程,它将类似于记录号)。除了此行号之外,您还可以查询消息本身是否包含在 10 个最顶层的线程中(由 that 创建(m.sender_id, m.receiver_id) in ...query...。最后,由于您只需要 10 个最顶层的消息,因此您将行号限制为小于或等于 10。

于 2013-02-12T12:20:08.850 回答
2

我建议采用 couling 的答案并稍微修改它,以便它使用公共表表达式有效地提供两个查询:

WITH threads (sender_id, receiver_id, latest) as (
        select sender, 
               receiver,
               max(sent) 
          from sof_messages
         where receiver = <user>
            or sender = <user>
         group by sender,
               receiver
         order by 3
         limit 10
 ), 
 messages ([messages fields listed here], rank) as (
         select m.*, 
                rank() over (partition by (sender, receiver), order by sent desc)
           from sof_messages
          WHERE (sender, receiver) in (select (sender, receiver) from threads))
 SELECT * from messages where rank <= 10;

这样做的好处是允许规划者很好地了解何时在此处使用索引。本质上,查询的三个部分中的每一个都是独立计划的。

于 2013-02-15T09:37:39.650 回答
1

Thread由于数据重复,创建表看起来不正确,但视图可能会有所帮助:

CREATE VIEW threads AS 
  SELECT sender_id, receiver_id, min(created_at) AS t_date
  FROM messages
  GROUP BY sender_id,receiver_id;

如果线程min(created_at)max(created_at)日期是其最新消息的日期,而不是最旧的消息,则更改为。

然后它可以简单地加入到消息中:

SELECT ... FROM messages JOIN threads USING (sender_id,receiver_id)
于 2013-02-06T11:58:12.937 回答
1

我发布这个来展示可以做什么。

我真的不推荐使用它。

执行两个单独的查询会更好:1 检索 10 个最近的线程,1 重复以拉回每个线程的 10 个最近的消息。

但是,您可以使用如下所示的rank() 窗口函数来实现您的目标。

select * from (
      select message.*,
             rank() over (partition by message.sender, message.receiver 
                              order by sent desc )  
      from sof_messages message,
           (
            select sender, 
                   receiver,
                   max(sent) 
              from sof_messages
             where receiver = <user>
                or sender = <user>
             group by sender,
                   receiver
             order by 3
             limit 10
           ) thread
      where message.sender = thread.sender
        and message.receiver = thread.receiver
      ) message_list

where rank <= 10

有几个不同的查询可以使用窗口函数实现您的目标,它们都不是特别干净。

于 2013-02-05T19:12:08.660 回答
0

我没有对此进行测试,但看起来您忘记了LIMIT 10子查询上的 10 个最新线程:

SELECT
  *
FROM
  (SELECT DISTINCT ON
     (sender_id, receiver_id) messages.* 
   FROM
     messages 
   WHERE
     (receiver_id = 5 OR sender_id = 5)
   ORDER BY
     sender_id, receiver_id, created_at DESC
   LIMIT
     10)   
  q
ORDER BY
  created_at DESC 
LIMIT
  10
OFFSET
  0;

(我已经漂亮地打印了 SQL,因此更容易判断发生了什么。)

于 2013-02-19T12:00:50.447 回答