3

问题:找到每个类别中至少有 10 个项目的前 2 个用户。

表结构:

CREATE TABLE items(
    id INT AUTO_INCREMENT PRIMARY KEY,
    datetime datetime,
    category INT,
    user INT,
    items_count INT
);

 

样本数据:

INSERT INTO items (datetime, category, user, items_count) VALUES
('2013-01-01 00:00:00', 1, 1, 10),
('2013-01-01 00:00:01', 1, 2, 1),
('2013-01-01 00:00:02', 1, 3, 10),
('2013-01-01 00:00:03', 1, 2, 9),

('2013-01-01 00:00:00', 2, 4, 10),
('2013-01-01 00:00:01', 2, 1, 10),
('2013-01-01 00:00:01', 2, 5, 10);

 

期望的结果:

category    user
1           1
1           3
2           4
2           5

 

注意:如结果所示,当多个用户同时满足要求时,我需要能够显示对用户的偏好。

 

SQL小提琴:

http://sqlfiddle.com/#!2/58e60

 

这是我尝试过的:

SELECT
  Derived.*,
  IF (@category != Derived.category, @rank := 1, @rank := @rank + 1) AS rank,
  @category := category


FROM(
  SELECT
    category,
    user,
    SUM(items_count) AS items_count,
    MAX(datetime) AS datetime


  FROM items


  GROUP BY
    category,
    user

  HAVING
    SUM(items_count) >= 10
) AS Derived


JOIN(SELECT @rank := 0, @category := 0) AS r


HAVING
  rank <= 2

ORDER BY
  Derived.category,
  Derived.datetime

 

但这是错误的。它不仅不考虑用户优先级,而且会使用如下数据产生错误的结果:

('2013-01-01 00:00:00', 1, 1, 10),
('2013-01-01 00:00:01', 1, 2, 1),
('2013-01-01 00:00:02', 1, 3, 10),
('2013-01-01 00:00:03', 1, 2, 9),
('2013-01-01 00:00:10', 1, 3, 1);

 

附加信息:我不知道程序是否可以在这种情况下有所作为,但不幸的是它也不是一个选项。运行此查询的用户只有 SELECT 权限。

4

2 回答 2

2

为了找到满足您需求的用户,您需要计数的累积总和。以下查询查找用户首次达到 10 个单位的情况。如果计数从不为负,则只有一个:

select i.*
from (select i.*,
             (select sum(items_count)
              from items i2
              where i2.user = i.user and
                    i2.category = i.category and
                    i2.datetime <= i.datetime
             ) as cumsum
      from items i
     ) i
where cumsum - items_count < 10 and cumsum >= 10
order by datetime;

要获得前两个,您需要使用 MySQL 技巧在组内进行计数。这是一个通常有效的示例:

select i.*
from (select i.*, if(@prevc = category, @rn := @rn + 1, @rn := 1) as rn, @prevc := category
      from (select i.*,
                   (select sum(items_count)
                    from items i2
                    where i2.user = i.user and
                          i2.category = i.category and
                          i2.datetime <= i.datetime
                   ) as cumsum
            from items i 
           ) i
           cross join
           (select @rn := 0) const
      where cumsum - items_count < 10 and cumsum >= 10
     ) i
where rn <= 2
order by category, datetime;

我对这种方法有疑问,因为 MySQL 中没有任何内容表明表达式@prevc := category实际上会计算rn. 但是,情况似乎如此,并且这似乎在实践中有效。

于 2013-07-19T15:07:21.380 回答
0

我尝试了 Gordon 的查询,但不幸的是它似乎不适用于大表;在等待 15 分钟后,我决定杀死它。然而,下面的查询对我来说效果很好,它在大约 8 秒内通过了一个 ~6M 行的表。

    #Variable
SET @min_items      = 10,
    @max_users      = 2,
    @preferred_user = 5,

    #Static
    @category       = 0,
    @user           = 0,
    @items          = 0,
    @row_num        = 1;


--


SELECT
  category,
  user,
  datetime


FROM(
  SELECT
    category,
    user,
    datetime,
    IF (@category = category, @row_num := @row_num + 1, @row_num := 1) AS row_num,
    @category := category


  FROM(
    SELECT
      category,
      user,
      datetime,
      IF (@user != user, @items := 0, NULL),
      IF (@items < @min_items, @items := @items + items_count, NULL) AS items_cumulative,
      @user := user


    FROM items


    ORDER BY
      category,
      user,
      datetime
  ) AS Derived


  WHERE items_cumulative >= @min_items


  ORDER BY
    category,
    datetime,
    FIELD(user, @preferred_user, user)
) AS Derived


WHERE row_num <= @max_users;
于 2013-07-23T13:07:38.813 回答