3

为了把这个问题放到上下文中,我试图根据事件日志计算“应用程序中的时间”。

假设下表:

user_id   event_time
2         2012-05-09 07:03:38
3         2012-05-09 07:03:42
4         2012-05-09 07:03:43
2         2012-05-09 07:03:44
2         2012-05-09 07:03:45
4         2012-05-09 07:03:52
2         2012-05-09 07:06:30

我想event_time从彼此相距 2 分钟内(并按用户分组)的一组时间戳中获得最高和最低之间的差异。如果时间戳超出该集合的 2 分钟间隔,则应将其视为另一个集合的一部分。

期望的输出:

user_id  seconds_interval
2        7     (because 07:03:45 - 07:03:38 is 7 seconds)
3        0     (because 07:03:42)
4        9     (because 07:03:52 - 2012-05-09 07:03:43)
2        0     (because 07:06:30 is outside 2 min interval of 1st user_id=2 set)

这是我尝试过的,虽然我不能分组seconds_interval(即使我可以,我不确定这是正确的方向):

SELECT (max(tr.event_time)-min(tr.event_time)) as seconds_interval
FROM some_table tr
INNER JOIN TrackingRaw tr2 ON (tr.event_time BETWEEN 
   tr2.event_time - INTERVAL 2 MINUTE AND tr2.event_time + INTERVAL 2 MINUTE) 
GROUP BY seconds_interval
4

1 回答 1

4

我认为没有一种非常直接的方法可以查询现有表以生成所需的数据。但是,您可以维护第二个用户会话表(当然,这样做的缺点是,如果您以后想要使用不同会话超时期限的报告,则需要从头开始重新填充该表):

CREATE TABLE Sessions (
  user_id INT,
  session_start TIMESTAMP,
  session_end   TIMESTAMP,
  PRIMARY KEY (user_id, session_start),
  FOREIGN KEY (user_id, session_start) REFERENCES TrackingRaw(user_id, event_time),
  FOREIGN KEY (user_id, session_end  ) REFERENCES TrackingRaw(user_id, event_time)
);

您可以使用以下触发器自动填充/更新此类表INSERT ... SELECT ... ON DUPLICATE KEY UPDATE

CREATE TRIGGER after_insert_TrackingRaw AFTER INSERT ON TrackingRaw FOR EACH ROW
  INSERT INTO Sessions (user_id, session_start, session_end)
    SELECT NEW.user_id,
           IFNULL(MAX(session_start), NEW.event_time),
           NEW.event_time
    FROM   Sessions
    WHERE  user_id = NEW.user_id
       AND session_end >= NEW.event_time - INTERVAL 2 MINUTE
  ON DUPLICATE KEY UPDATE
    session_start = session_start,
    session_end   = NEW.event_time;

然后,获取您想要的查询结果:

SELECT user_id, session_end - session_start AS seconds_interval FROM Sessions;

sqlfiddle上查看。


更新

经过进一步思考,您当然可以在存储过程中构建这样的Sessions表:

CREATE PROCEDURE getSessions(IN secs INT) READS SQL DATA BEGIN
  DECLARE no_more_rows BOOLEAN;
  DECLARE cur CURSOR FOR
    SELECT user_id, event_time FROM TrackingRaw ORDER BY event_time ASC;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_rows = TRUE;

  DROP   TEMPORARY TABLE IF EXISTS Sessions;
  CREATE TEMPORARY TABLE Sessions (
    user_id INT,
    session_start TIMESTAMP,
    session_end   TIMESTAMP,
    PRIMARY KEY(user_id,session_start),
    FOREIGN KEY(user_id,session_start) REFERENCES TrackingRaw(user_id,event_time),
    FOREIGN KEY(user_id,session_end  ) REFERENCES TrackingRaw(user_id,event_time)
  );

  OPEN cur;
  the_loop: LOOP
    FETCH cur INTO @u, @t;
    IF no_more_rows THEN
      CLOSE cur;
      LEAVE the_loop;
    END IF;

    INSERT INTO Sessions
      SELECT @u, IFNULL(MAX(session_start), @t), @t
      FROM   Sessions
      WHERE  user_id = @u AND session_end >= @t - secs
    ON DUPLICATE KEY UPDATE
      session_start = session_start, session_end = @t
  END LOOP the_loop;

  DEALLOCATE PREPARE stmt;
  SELECT user_id, session_end - session_start AS seconds_interval FROM Sessions;
  DROP TEMPORARY TABLE Sessions;
END;;

然后获得你的输出:

CALL getSessions(120); -- for a 2 minute (120 second) timeout
于 2012-06-29T08:46:45.520 回答