我有以下一系列表中定义的 3 个进程 A、B 和 C:
http://sqlfiddle.com/#!2/48f54
CREATE TABLE processA
(date_time datetime, valueA int);
INSERT INTO processA
(date_time, valueA)
VALUES
('2013-1-8 22:10:00', 100),
('2013-1-8 22:15:00', 100),
('2013-1-8 22:30:00', 100),
('2013-1-8 22:35:00', 100),
('2013-1-8 22:40:00', 100),
('2013-1-8 22:45:00', 100),
('2013-1-8 22:50:00', 100),
('2013-1-8 23:05:00', 100),
('2013-1-8 23:10:00', 100),
('2013-1-8 23:20:00', 100),
('2013-1-8 23:25:00', 100),
('2013-1-8 23:35:00', 100),
('2013-1-8 23:40:00', 100),
('2013-1-9 00:05:00', 100),
('2013-1-9 00:10:00', 100);
CREATE TABLE processB
(date_time datetime, valueB decimal(4,2));
INSERT INTO processB
(date_time, valueB)
VALUES
('2013-1-08 21:46:00', 3),
('2013-1-08 22:11:00', 4),
('2013-1-08 22:31:00', 5),
('2013-1-08 22:36:00', 6),
('2013-1-08 22:41:00', 7),
('2013-1-08 23:06:00', 8),
('2013-1-08 23:20:00', 2),
('2013-1-08 23:46:00', 3),
('2013-1-09 00:34:00', 9);
CREATE TABLE processC
(date_time datetime, status varchar(4));
INSERT INTO processC
VALUES
('2013-1-08 18:00:00', 'yes'),
('2013-1-08 19:00:00', 'yes'),
('2013-1-08 20:00:00', 'yes'),
('2013-1-08 21:00:00', 'yes'),
('2013-1-08 22:00:00', 'yes'),
('2013-1-08 23:00:00', 'no'),
('2013-1-08 00:00:00', 'no'),
('2013-1-08 01:00:00', 'no');
正如您所看到的,每个进程的读数发生的时间是不同的。
ProcessA,如果它发生,则每隔 5 分钟执行一次
ProcessB,读数出现在不可预测的时间,但通常在一小时内出现多次
ProcessC 将始终具有每小时值(是或否)。
首先,我想转换 processB 以便每隔 5 分钟读取一次,以便数据与 processA 对齐,然后我可以在 5 分钟间隔标记处对两个表进行简单连接。对于转换,每 5 分钟的数据应设置为[-30,30) 分钟窗口内可用的最近的processB 观察值。如果值是等距的,则取平均值。如果在 30 分钟窗口中没有可用,则将其设置为 null。
一旦我有了这个,我可以使用 ProcessC 在 %Y%m%d%H 上进行简单的连接,使用类似下面的方法来获得一个所有数据在 5 分钟间隔标记处对齐的最终表:
date_format(date_time, '%Y%m%d%H') = date_format(date_time, '%Y%m%d%H')
如果有人有任何指示/指导,我将不胜感激。我很感激。
样本输出:
'2013-1-8 22:10:00', 100, 4, yes <--- closer to 22:11 than 21:46
'2013-1-8 22:15:00', 100, 4, yes <--- closer to 22:11 than 21:31
'2013-1-8 22:30:00', 100, 5, yes <--- closer to 22:31 than 22:11
'2013-1-8 22:35:00', 100, 6, yes <--- closer to 22:36 than 22:31
'2013-1-8 22:40:00', 100, 7, yes <--- closer to 22:41 than 22:36
'2013-1-8 22:45:00', 100, 7, yes <--- closer to 22:41 than 23:06
'2013-1-8 22:50:00', 100, 7, yes <--- closer to 22:41 than 23:06
'2013-1-8 23:05:00', 100, 8, yes <--- closer to 23:06 than 23:06
'2013-1-8 23:10:00', 100, 8, no <--- closer to 23:06 than 23:20
'2013-1-8 23:20:00', 100, 2, no <--- closer to 23:20 than 23:10
'2013-1-8 23:25:00', 100, 2, no <--- closer to 23:20 than 23:10
'2013-1-8 23:35:00', 100, 3, no <--- closer to 23:46 than 23:20
'2013-1-9 00:05:00', 100, 3, no <--- closer to 23:46 than 00:34
'2013-1-9 00:10:00', 100, 6, no <--- takes the avg of 3 and 9