3

我在数据库中有一个按时间存储日志数据的表。一天之内,数据库中可能有一百万行。时间没有任何固定的间隔。它有几个索引,包括时间。我想要做的是构建一个查询,该查询将返回一组行,每个时间间隔一行。例如,我可以执行查询以在一天内每 15 分钟返回 1 行。这将返回 24*60=96 行。返回的每一行实际上是请求间隔之前数据库中最近的行(因为数据库中的数据将不等于请求的间隔)。

我不知道该怎么做。我不能只为一组特定的索引和时间间隔查询所有行,因为它会将超过 1 GB 的数据加载到内存中,这太慢了。有没有任何有效的方法可以使用 SQL 来做到这一点。我正在使用 MySQL 数据库。我愿意更改表索引/等...

TIME

11:58
12:03
12:07
12:09
12:22
12:27
12:33
12:38
12:43
12:49
12:55

如果我想从 12:00 到 1:00 每隔 15 分钟查询一次,我会回复:

11:58 (nearest 12:00)
12:09 (nearest 12:15)
12:27 (nearest 12:30)
12:43 (nearest 12:45)
12:55 (nearest 1:00) 

如果它更容易,我还可以将时间存储为数字(即自 1970 年以来的毫秒)。在上面的查询中,这将是 900000 毫秒的间隔。

4

3 回答 3

4

所以,我想过这样的事情:

SELECT 
  MIN(timeValue)
FROM e
GROUP BY (to_seconds(timeValue) - (to_seconds(timeValue) % (60 * 5)))

..会为你做,但这只会返回整个表的 MIN(timeValue)。如果四舍五入到最接近的 5 分钟的秒数在其自己的列中,则它可以工作。

请参阅SQL 小提琴

根据安迪里编辑,这有效:(http://sqlfiddle.com/#!2/bb870/6

SELECT MIN(t)
FROM e
GROUP BY to_seconds(t) DIV (60 * 5)

但这只是给出了一行:(http://sqlfiddle.com/#!2/bb870/7

SELECT MIN(t)
FROM e
GROUP BY to_seconds(t) - (to_seconds(t) % (60 * 5))

有谁知道为什么?

于 2012-05-17T21:01:30.787 回答
0

I can't think of a good way to do it all in one query. Perhaps someone else can think of a better way, but perhaps you could use something like this:

$startTime = mktime(12, 0);
$endTime = mktime(13, 0);
$queries = array();
for ($i = $startTime; $i <= $endTime; $i += 900)
    $queries[] = "SELECT MAX(timeValue) FROM table1 WHERE timeValue < '". date("G:i", $i) ."'";

$query = implode("\nUNION\n", $queries);

I just realized that this assumes that you are using PHP. If you are not, then just use the resulting query, which will look like:

SELECT MAX(timeValue) FROM table1 WHERE timeValue < '12:00'
UNION
SELECT MAX(timeValue) FROM table1 WHERE timeValue < '12:15'
UNION
SELECT MAX(timeValue) FROM table1 WHERE timeValue < '12:30'
UNION
SELECT MAX(timeValue) FROM table1 WHERE timeValue < '12:45'
UNION
SELECT MAX(timeValue) FROM table1 WHERE timeValue < '13:00'

Not sure if the < comparison will work 100% correctly with these string values, but I definitely think it would be a good idea to switch them to unix timestamps (or ms since 1970, if you need that much granularity). I have found it's always easier to work with integer values for date/time instead of strings.

于 2012-05-17T19:59:30.960 回答
0

我认为使用函数非常容易,而且我没有注意到很大的性能影响,尽管游标可能会更好地执行,具体取决于两次之间的行数。

CREATE TABLE TEST_TIMES (EventTime datetime)
-- skipping INSERTS of your times

CREATE FUNCTION fn_MyTimes ( @StartTime datetime, @EndTime datetime, @Minutes int )
    RETURNS @TimeTable TABLE (TimeValue datetime)
AS BEGIN
    DECLARE @CurrentTime datetime
    SET @CurrentTime = @StartTime
    WHILE @CurrentTime <= @EndTime
    BEGIN
        INSERT INTO @TimeTable VALUES (@CurrentTime)
        SET @CurrentTime = DATEADD(minute, @Minutes, @CurrentTime)
    END
    RETURN
END

CREATE FUNCTION fn_ClosestTime ( @CheckTime datetime )
    RETURNS datetime
AS BEGIN
    DECLARE @LowerTime datetime, @HigherTime datetime

    SELECT @LowerTime = MAX(EventTime)
    FROM TEST_TIMES
    WHERE EventTime <= @CheckTime

    SELECT @HigherTime = MAX(EventTime)
    FROM TEST_TIMES
    WHERE EventTime >= @CheckTime

    IF @LowerTime IS NULL RETURN @HigherTime -- both null?  then null
    IF @HigherTime IS NULL RETURN @LowerTime

    IF DATEDIFF(ms, @LowerTime, @CheckTime) < DATEDIFF(ms, @CheckTime, @HigherTime)
        RETURN @LowerTime
    RETURN @HigherTime
END

SELECT TimeValue, dbo.fn_ClosestTime(TimeValue) as ClosestTime
FROM fn_MyTimes('2012-05-17 12:00', '2012-05-17 13:00', 15)

结果:

TimeValue               ClosestTime
----------------------- -----------------------
2012-05-17 12:00:00.000 2012-05-17 11:58:00.000
2012-05-17 12:15:00.000 2012-05-17 12:09:00.000
2012-05-17 12:30:00.000 2012-05-17 12:27:00.000
2012-05-17 12:45:00.000 2012-05-17 12:43:00.000
2012-05-17 13:00:00.000 2012-05-17 12:55:00.000
于 2012-05-17T21:47:48.030 回答