2

一个简单的表:

ForumPost
--------------
ID (int PK)
UserID (int FK)
Date (datetime)

我希望返回特定用户连续n天每天至少发布 1 次帖子的次数。

例子:

User 15844 has posted at least 1 post a day for 30 consecutive days 10 times

我已经用 linq/lambda 标记了这个问题,以及一个很好的解决方案。我知道我可以通过迭代所有用户记录来解决这个问题,但这很慢。

4

2 回答 2

4

您可以使用一个方便的技巧ROW_NUMBER()来查找连续条目,想象以下一组日期,它们的 row_number (从 0 开始):

Date        RowNumber
20130401    0
20130402    1
20130403    2
20130404    3
20130406    4
20130407    5

对于连续的条目,如果从值中减去 row_number,则会得到相同的结果。例如

Date        RowNumber   date - row_number
20130401    0           20130401
20130402    1           20130401
20130403    2           20130401
20130404    3           20130401
20130406    4           20130402
20130407    5           20130402

然后,您可以分组date - row_number以获取连续天数的集合(即前 4 条记录和最后 2 条记录)。

要将其应用于您的示例,您将使用:

WITH Posts AS
(   SELECT  FirstPost = DATEADD(DAY, 1 - ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [Date]), [Date]),
            UserID,
            Date
    FROM    (   SELECT  DISTINCT UserID, [Date] = CAST(Date AS [Date])
                FROM    ForumPost
            ) fp
), Posts2 AS
(   SELECT  FirstPost, 
            UserID, 
            Days = COUNT(*), 
            LastDate = MAX(Date)
    FROM    Posts
    GROUP BY FirstPost, UserID
)
SELECT  UserID, ConsecutiveDates = MAX(Days)
FROM    Posts2
GROUP BY UserID;

SQL Fiddle 示例(简单,每个用户最多连续几天)

进一步的例子来展示如何获得所有连续的时期

编辑

我认为以上内容并没有完全回答这个问题,这将给出用户发布的次数,或连续 n 天:

WITH Posts AS
(   SELECT  FirstPost = DATEADD(DAY, 1 - ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY [Date]), [Date]),
            UserID,
            Date
    FROM    (   SELECT  DISTINCT UserID, [Date] = CAST(Date AS [Date])
                FROM    ForumPost
            ) fp
), Posts2 AS
(   SELECT  FirstPost, 
            UserID, 
            Days = COUNT(*), 
            FirstDate = MIN(Date), 
            LastDate = MAX(Date)
    FROM    Posts
    GROUP BY FirstPost, UserID
)
SELECT  UserID, [Times Over N Days] = COUNT(*)
FROM    Posts2
WHERE   Days >= 30
GROUP BY UserID;

SQL Fiddle 示例

于 2013-04-15T12:51:19.487 回答
1

我认为,您的特定应用程序使这变得非常简单。如果在“n”天的间隔中有“n”个不同的日期,那么这些“n”个不同的日期必须是连续的。

滚动到底部以获取仅需要公用表表达式并更改为 PostgreSQL 的通用解决方案。(开玩笑。我是在 PostgreSQL 中实现的,因为我时间不够。)

create table ForumPost (
  ID integer primary key,
  UserID integer not null,
  post_date date not null
);

insert into forumpost values
(1, 1, '2013-01-15'),
(2, 1, '2013-01-16'),
(3, 1, '2013-01-17'),
(4, 1, '2013-01-18'),
(5, 1, '2013-01-19'),
(6, 1, '2013-01-20'),
(7, 1, '2013-01-21'),

(11, 2, '2013-01-15'),
(12, 2, '2013-01-16'),
(13, 2, '2013-01-17'),
(16, 2, '2013-01-17'),
(14, 2, '2013-01-18'),
(15, 2, '2013-01-19'),

(21, 3, '2013-01-17'),
(22, 3, '2013-01-17'),
(23, 3, '2013-01-17'),
(24, 3, '2013-01-17'),
(25, 3, '2013-01-17'),
(26, 3, '2013-01-17'),
(27, 3, '2013-01-17');

现在,让我们看看这个查询的输出。为简洁起见,我正在查看 5 天的间隔,而不是 30 天的间隔。

select userid, count(distinct post_date) distinct_dates
from forumpost
where post_date between '2013-01-15' and '2013-01-19'
group by userid;

USERID  DISTINCT_DATES  
1       5
2       5
3       1

对于符合条件的用户,该 5 天间隔内的不同日期数必须为 5,对吗?所以我们只需要将该逻辑添加到 HAVING 子句中。

select userid, count(distinct post_date) distinct_dates
from forumpost
where post_date between '2013-01-15' and '2013-01-19'
group by userid
having count(distinct post_date) = 5;

USERID  DISTINCT_DATES  
1       5
2       5

更通用的解决方案

说真的没有道理,如果你从 2013 年 1 月 1 日到 2013 年 1 月 31 日每天发帖,你已经连续 30 天发了 2 次。相反,我希望时钟在 2013 年 1 月 31 日重新开始。我很抱歉在 PostgreSQL 中实现;稍后我将尝试在 T-SQL 中实现。

with first_posts as (
  select userid, min(post_date) first_post_date
  from forumpost
  group by userid
), 
period_intervals as (
  select userid, first_post_date period_start, 
         (first_post_date + interval '4' day)::date period_end
  from first_posts
), user_specific_intervals as (
  select 
    userid, 
    (period_start + (n || ' days')::interval)::date as period_start, 
    (period_end + (n || ' days')::interval)::date as period_end 
  from period_intervals, generate_series(0, 30, 5) n
)
select userid, period_start, period_end, 
       (select count(distinct post_date) 
        from forumpost
        where forumpost.post_date between period_start and period_end
          and userid = forumpost.userid) distinct_dates
from user_specific_intervals
order by userid, period_start;
于 2013-04-15T13:21:11.017 回答