sql - Potsgresql 查询具有重叠日期，多个数组

Question

编辑：我已经编辑了这个问题以便更容易理解。请原谅我的任何误解。

我有一个带有列的临时表

zone_name, nodeid, nodelabel, nodegainedservice, nodelostservice
Zone1, 3, Windows-SRV1, "2012-11-27 13:10:30+08", "2012-11-27 13:00:40+08"
Zone1, 5, Windows-SRV2, "2012-12-20 13:10:30+08", "2012-12-18 13:00:40+08"
....
....

许多区域和许多节点以及多次获得服务和失去服务的相同节点。

nodegainedservice意义节点已激活，nodelostservice意义节点已关闭。

如何进行查询以获取一段时间内的每个区域可用性？

例如，Zone1 有 Windows-SRV1、Windows-SRV2。找出 Zone1 宕机的次数和时间。这些服务器是复制服务器，当区域中的所有服务器在某个时间关闭时，区域就会关闭，如果其中任何一个恢复运行，区域就会启动。

请使用以下示例数据

zonename nodeid  nodelabel  noderegainedservice  nodelostservice
Zone1  27  Windows-SRV1  2013-02-21 10:04:56+08  2013-02-21 09:48:48+08
Zone1  27  Windows-SRV1  2013-02-21 10:14:01+08  2013-02-21 10:09:27+08
Zone1  27  Windows-SRV1  2013-02-22 10:26:29+08  2013-02-22 10:24:20+08
Zone1  27  Windows-SRV1  2013-02-22 11:27:24+08  2013-02-22 11:25:15+08
Zone1  27  Windows-SRV1  2013-02-28 16:24:59+08  2013-02-28 15:52:59+08
Zone1  27  Windows-SRV1  2013-02-28 16:56:19+08  2013-02-28 16:40:18+08
Zone1  39  Windows-SRV2  2013-02-21 13:15:53+08  2013-02-21 12:26:04+08
Zone1  39  Windows-SRV2  2013-02-23 13:23:10+08  2013-02-22 10:21:14+08
Zone1  39  Windows-SRV2  2013-02-24 13:35:23+08  2013-02-23 13:33:32+08
Zone1  39  Windows-SRV2  2013-02-26 15:17:25+08  2013-02-25 14:25:51+08
Zone1  39  Windows-SRV2  2013-02-28 18:49:56+08  2013-02-28 15:43:01+08
Zone1  13  Windows-SRV3  2013-02-22 17:23:59+08  2013-02-22 10:19:13+08
Zone1  13  Windows-SRV3  2013-02-28 16:54:27+08  2013-02-28 16:13:48+08

输出 zone_outages 如下，例如，

zonename duration from_time to_time

zone1 00:02:09 2013-02-22 10:24:20+08 2013-02-22 10:26:29+08 
zone1 00:02:09 2013-02-22 11:25:15+08 2013-02-22 11:27:24+08    
zone1 00:11:11 2013-02-28 16:13:48+08 2013-02-28 16:24:59+08 
zone1 00:14:09 2013-02-28 16:40:18+08 2013-02-28 16:54:27+08

注意：可能有这样的条目

Zone2  24  Windows-SRV12  \n  \n

在这种情况下，Zone2 Windows-SRV12 从未关闭，Zone2 可用性将是 100%。

score 2 · Accepted Answer

您是否考虑过 PG 9.2 的范围类型而不是两个单独的时间戳字段？

http://www.postgresql.org/docs/9.2/static/rangetypes.html

就像是：

CREATE TABLE availability (
    zone_name varchar, nodeid int, nodelabel varchar, during tsrange
);

INSERT INTO availability
VALUES (zone1, 3, 'srv1', '[2013-01-01 14:30, 2013-01-01 15:30)');

除非我弄错了，否则您将能够与工会、交叉口等一起工作，这应该会使您的工作更简单。可能有一些我不熟悉的聚合函数也适合后者。

如果需要，还可以使用语句和窗口函数查看更复杂的查询：

http://www.postgresql.org/docs/9.2/static/tutorial-window.html

http://www.postgresql.org/docs/9.2/static/functions-window.html

一些测试表明 sum() 不适用于 tsrange 类型。

话虽如此，后续查询中使用的 sql 模式：

drop table if exists nodes;

create table nodes (
    zone int not null,
    node int not null,
    uptime tsrange
);

-- this requires the btree_gist extension:
-- alter table nodes add exclude using gist (uptime with &&, zone with =, node with =);

数据（与您的样本略有不同）：

insert into nodes values
    (1, 1,  '[2013-02-20 00:00:00, 2013-02-21 09:40:00)'),
    (1, 1,  '[2013-02-21 09:48:48, 2013-02-21 10:04:56)'),
    (1, 1,  '[2013-02-21 10:09:27, 2013-02-21 10:14:01)'),
    (1, 1,  '[2013-02-22 10:24:20, 2013-02-22 10:26:29)'),
    (1, 1,  '[2013-02-22 11:25:15, 2013-02-22 11:27:24)'),
    (1, 1,  '[2013-02-28 15:52:59, 2013-02-28 16:24:59)'),
    (1, 1,  '[2013-02-28 16:40:18, 2013-02-28 16:56:19)'),
    (1, 1,  '[2013-02-28 17:00:00, infinity)'),
    (1, 2,  '[2013-02-20 00:00:01, 2013-02-21 12:15:00)'),
    (1, 2,  '[2013-02-21 12:26:04, 2013-02-21 13:15:53)'),
    (1, 2,  '[2013-02-22 10:21:14, 2013-02-23 13:23:10)'),
    (1, 2,  '[2013-02-23 13:33:32, 2013-02-24 13:35:23)'),
    (1, 2,  '[2013-02-25 14:25:51, 2013-02-26 15:17:25)'),
    (1, 2,  '[2013-02-28 15:43:01, 2013-02-28 18:49:56)'),
    (2, 3,  '[2013-02-20 00:00:01, 2013-02-22 09:01:00)'),
    (2, 3,  '[2013-02-22 10:19:13, 2013-02-22 17:23:59)'),
    (2, 3,  '[2013-02-28 16:13:48, 2013-02-28 16:54:27)');

按顺序排列的原始数据（为清楚起见）：

select *
from nodes
order by zone, uptime, node;

产量：

 zone | node |                    uptime                     
------+------+-----------------------------------------------
    1 |    1 | ["2013-02-20 00:00:00","2013-02-21 09:40:00")
    1 |    2 | ["2013-02-20 00:00:01","2013-02-21 12:15:00")
    1 |    1 | ["2013-02-21 09:48:48","2013-02-21 10:04:56")
    1 |    1 | ["2013-02-21 10:09:27","2013-02-21 10:14:01")
    1 |    2 | ["2013-02-21 12:26:04","2013-02-21 13:15:53")
    1 |    2 | ["2013-02-22 10:21:14","2013-02-23 13:23:10")
    1 |    1 | ["2013-02-22 10:24:20","2013-02-22 10:26:29")
    1 |    1 | ["2013-02-22 11:25:15","2013-02-22 11:27:24")
    1 |    2 | ["2013-02-23 13:33:32","2013-02-24 13:35:23")
    1 |    2 | ["2013-02-25 14:25:51","2013-02-26 15:17:25")
    1 |    2 | ["2013-02-28 15:43:01","2013-02-28 18:49:56")
    1 |    1 | ["2013-02-28 15:52:59","2013-02-28 16:24:59")
    1 |    1 | ["2013-02-28 16:40:18","2013-02-28 16:56:19")
    1 |    1 | ["2013-02-28 17:00:00",infinity)
    2 |    3 | ["2013-02-20 00:00:01","2013-02-22 09:01:00")
    2 |    3 | ["2013-02-22 10:19:13","2013-02-22 17:23:59")
    2 |    3 | ["2013-02-28 16:13:48","2013-02-28 16:54:27")
(17 rows)

可用节点@ 2013-02-21 09:20:00:

with upnodes as (
select zone, node, uptime
from nodes
where '2013-02-21 09:20:00'::timestamp <@ uptime
)
select *
from upnodes
order by zone, uptime, node;

产量：

 zone | node |                    uptime                     
------+------+-----------------------------------------------
    1 |    1 | ["2013-02-20 00:00:00","2013-02-21 09:40:00")
    1 |    2 | ["2013-02-20 00:00:01","2013-02-21 12:15:00")
    2 |    3 | ["2013-02-20 00:00:01","2013-02-22 09:01:00")
(3 rows)

从 2013-02-21 00:00:00 包括到 2013-02-24 00:00:00 的可用节点不包括：

with upnodes as (
select zone, node, uptime
from nodes
where '[2013-02-21 00:00:00, 2013-02-24 00:00:00)'::tsrange && uptime
)
select * from upnodes
order by zone, uptime, node;

产量：

 zone | node |                    uptime                     
------+------+-----------------------------------------------
    1 |    1 | ["2013-02-20 00:00:00","2013-02-21 09:40:00")
    1 |    2 | ["2013-02-20 00:00:01","2013-02-21 12:15:00")
    1 |    1 | ["2013-02-21 09:48:48","2013-02-21 10:04:56")
    1 |    1 | ["2013-02-21 10:09:27","2013-02-21 10:14:01")
    1 |    2 | ["2013-02-21 12:26:04","2013-02-21 13:15:53")
    1 |    2 | ["2013-02-22 10:21:14","2013-02-23 13:23:10")
    1 |    1 | ["2013-02-22 10:24:20","2013-02-22 10:26:29")
    1 |    1 | ["2013-02-22 11:25:15","2013-02-22 11:27:24")
    1 |    2 | ["2013-02-23 13:33:32","2013-02-24 13:35:23")
    2 |    3 | ["2013-02-20 00:00:01","2013-02-22 09:01:00")
    2 |    3 | ["2013-02-22 10:19:13","2013-02-22 17:23:59")
(11 rows)

可用区域从 2013-02-21 00:00:00 incl 到 2013-02-24 00:00:00 excl'

with upnodes as (
select zone, node, uptime
from nodes
where '[2013-02-21 00:00:00, 2013-02-24 00:00:00)'::tsrange && uptime
),
upzones_max as (
select u1.zone, tsrange(lower(u1.uptime), max(upper(u2.uptime))) as uptime
from upnodes as u1
join upnodes as u2 on u2.zone = u1.zone and u2.uptime && u1.uptime
group by u1.zone, lower(u1.uptime)
),
upzones as (
select u1.zone, tsrange(min(lower(u2.uptime)), upper(u1.uptime)) as uptime
from upzones_max as u1
join upzones_max as u2 on u2.zone = u1.zone and u2.uptime && u1.uptime
group by u1.zone, upper(u1.uptime)
)
select zone, uptime, upper(uptime) - lower(uptime) as duration
from upzones
order by zone, uptime;

产量：

 zone |                    uptime                     |    duration     
------+-----------------------------------------------+-----------------
    1 | ["2013-02-20 00:00:00","2013-02-21 12:15:00") | 1 day 12:15:00
    1 | ["2013-02-21 12:26:04","2013-02-21 13:15:53") | 00:49:49
    1 | ["2013-02-22 10:21:14","2013-02-23 13:23:10") | 1 day 03:01:56
    1 | ["2013-02-23 13:33:32","2013-02-24 13:35:23") | 1 day 00:01:51
    2 | ["2013-02-20 00:00:01","2013-02-22 09:01:00") | 2 days 09:00:59
    2 | ["2013-02-22 10:19:13","2013-02-22 17:23:59") | 07:04:46
(6 rows)

如果您编写（或找到）对重叠范围类型求和的自定义聚合函数，则可能有更好的方法来编写后一个查询——我遇到的重要问题是隔离适当的 group by 子句；我最终解决了两个嵌套的 group by 子句。

也可以重写查询以适应您当前的模式，方法是用诸如 tsrange(start_date, end_date) 之类的表达式替换 uptime 字段，或者编写一个这样做的视图。

score 0 · Accepted Answer

DROP table if exists temptable;
CREATE TABLE temptable
(
  zone_name character varying(255),
  nodeid integer,
  nodelabel character varying(255),
  nodegainedservice timestamp with time zone,
  nodelostservice timestamp with time zone
);
INSERT INTO tempTable (zone_name, nodeid, nodelabel, nodegainedservice, nodelostservice) VALUES 
('Zone1',   27, 'Windows-SRV1', '2013-02-21 10:04:56+08',   '2013-02-21 09:48:48+08'),
('Zone1',   27, 'Windows-SRV1', '2013-02-21 10:14:01+08',   '2013-02-21 10:09:27+08'),
('Zone1',   27, 'Windows-SRV1', '2013-02-22 10:26:29+08',   '2013-02-22 10:24:20+08'),
('Zone1',   27, 'Windows-SRV1', '2013-02-22 11:27:24+08',   '2013-02-22 11:25:15+08'),
('Zone1',   27, 'Windows-SRV1', '2013-02-28 16:24:59+08',   '2013-02-28 15:52:59+08'),
('Zone1',   27, 'Windows-SRV1', '2013-02-28 16:56:19+08',   '2013-02-28 16:40:18+08'),
('Zone1',   39, 'Windows-SRV2', '2013-02-21 13:15:53+08',   '2013-02-21 12:26:04+08'),
('Zone1',   39, 'Windows-SRV2', '2013-02-23 13:23:10+08',   '2013-02-22 10:21:14+08'),
('Zone1',   39, 'Windows-SRV2', '2013-02-24 13:35:23+08',   '2013-02-23 13:33:32+08'),
('Zone1',   39, 'Windows-SRV2', '2013-02-26 15:17:25+08',   '2013-02-25 14:25:51+08'),
('Zone1',   39, 'Windows-SRV2', '2013-02-28 18:49:56+08',   '2013-02-28 15:43:01+08'),
('Zone2',   13, 'Windows-SRV3', '2013-02-22 17:23:59+08',   '2013-02-22 10:19:13+08'),
('Zone2',   13, 'Windows-SRV3', '2013-02-28 16:54:27+08',   '2013-02-28 16:13:48+08'),
('Zone2',   14, 'Windows-SRV4', '2013-02-22 11:02:56+08',   '2013-02-22 10:01:48+08');

with downodes as (
select zone_name, nodeid, nodelostservice, nodegainedservice
from temptable
WHERE (nodelostservice, nodegainedservice) OVERLAPS ('Wed Feb 20 00:00:00 +0800 2013'::TIMESTAMP, 'Fri Mar 01 00:00:00 +0800 2013'::TIMESTAMP)
),
donezones_max as(
select downodes1.zone_name, downodes1.nodeid, downodes1.nodelostservice, min(downodes2.nodegainedservice) as nodegainedservice
from downodes as downodes1
join downodes as downodes2 on downodes2.zone_name = downodes1.zone_name and ((downodes2.nodelostservice, downodes2.nodegainedservice) OVERLAPS (downodes1.nodelostservice, downodes1.nodegainedservice))
group by downodes1.zone_name, downodes1.nodeid, downodes1.nodelostservice
),
downzones as(
select downodes1.zone_name, downodes1.nodeid, max(downodes2.nodelostservice) as nodelostservice, downodes1.nodegainedservice
from donezones_max as downodes1
join donezones_max as downodes2 on downodes2.zone_name = downodes1.zone_name and  ((downodes2.nodelostservice, downodes2.nodegainedservice) OVERLAPS (downodes1.nodelostservice, downodes1.nodegainedservice))
group by downodes1.zone_name, downodes1.nodeid, downodes1.nodegainedservice 
),
zone_outages as(
SELECT 
    zone_name,
    nodelostservice,
    nodegainedservice,
    nodegainedservice - nodelostservice AS duration,
    CAST('1' AS INTEGER) as outage_counter
FROM downzones GROUP BY zone_name, nodelostservice, nodegainedservice HAVING COUNT(*) > 1 ORDER BY zone_name, nodelostservice)
select 
    zone_name,
    EXTRACT(epoch from (SUM(duration) / (greatest(1, SUM(outage_counter))))) AS average_duration_seconds,
    SUM(outage_counter) AS outage_count
FROM zone_outages GROUP BY zone_name ORDER BY zone_name

sql - Potsgresql 查询具有重叠日期，多个数组

2 回答 2

Related

Reference