sql - 基于复杂业务逻辑的 SQL Query

Question

我有一个结构表：

id, timestamp, deviceId, datatype, measure

列度量值表示数据类型的值。例如，当处理开始时，数据类型为 19，度量为 1。当处理完成时，数据类型仍为 19，值为 0，并插入具有相同时间戳、数据类型 54 和值作为某个值的新行。这意味着在完成时系统正在调用一些触发器来更新此表。下面的示例数据

1001, 2013-01-02 09:20:00, 501, 19, 1
1005, 2013-01-02 10:00:00, 501, 19, 0
1006, 2013-01-02 10:00:00, 501, 54, 65

1005和1006的时间戳相同，1001的时间戳总是小于1005的

1011, 2013-01-02 09:20:00, 601, 19, 1
1015, 2013-01-02 10:00:00, 601, 19, 0
1016, 2013-01-02 10:00:00, 601, 54, 105

1015和1016的时间戳相同，1011的时间戳总是小于1015的

1021, 2013-01-02 09:20:00, 701, 19, 1
1022, 2013-01-02 10:00:00, 701, 19, 0
1023, 2013-01-02 10:00:00, 701, 54, 81

1022和1023的时间戳相同，1021的时间戳总是小于1022的

同样的过程可以同时发生在多个设备上。

现在的要求是找到每个已完成事务的开始和结束时间，例如

1006, 2013-01-02 09:20:00, 2013-01-02 10:20:00, 501, 65
1016, 2013-01-02 09:20:00, 2013-01-02 10:20:00, 601, 105
1023, 2013-01-02 09:20:00, 2013-01-02 10:20:00, 701, 81

大约 5 年后，我正在编写 SQL 查询并且完全卡住了。任何指针/建议将不胜感激。

提前致谢

score 2 · Accepted Answer

SQL小提琴

CREATE TABLE t
        (id int, ts timestamp, deviceId int, datatype int, measure int)
;

INSERT INTO t
        (id, ts, deviceId, datatype, measure)
VALUES
        (1001, '2013-01-02 09:20:00', 501, 19, 1),
        (1005, '2013-01-02 10:00:00', 501, 19, 0),
        (1006, '2013-01-02 10:00:00', 501, 54, 65),
        (1007, '2013-01-02 10:20:00', 501, 19, 1),
        (1008, '2013-01-02 11:00:00', 501, 19, 0),
        (1009, '2013-01-02 11:00:00', 501, 54, 65),
        (1011, '2013-01-02 09:20:00', 601, 19, 1),
        (1015, '2013-01-02 10:00:00', 601, 19, 0),
        (1016, '2013-01-02 10:00:00', 601, 54, 105),
        (1021, '2013-01-02 09:20:00', 701, 19, 1),
        (1022, '2013-01-02 10:00:00', 701, 19, 0),
        (1023, '2013-01-02 10:00:00', 701, 54, 81)
;

with parted as (
    select floor((rn - 1) / 2.0) p, *
    from (
        select
            row_number() over (partition by deviceId order  by ts, datatype) rn,
            id, ts, deviceId, dataType, measure
        from t
        where not(datatype = 19 and measure = 0)
    ) s
)
select
    p1.id, p0.ts "start", p1.ts "end", p1.deviceId, p1.measure
from
    parted p0
    inner join
    parted p1 on
        p0.deviceId = p1.deviceId
        and p0.p = p1.p
        and p0.datatype = 19 and p1.datatype = 54
order by p1.id
;
  id  |        start        |         end         | deviceid | measure 
------+---------------------+---------------------+----------+---------
 1006 | 2013-01-02 09:20:00 | 2013-01-02 10:00:00 |      501 |      65
 1009 | 2013-01-02 10:20:00 | 2013-01-02 11:00:00 |      501 |      65
 1016 | 2013-01-02 09:20:00 | 2013-01-02 10:00:00 |      601 |     105
 1023 | 2013-01-02 09:20:00 | 2013-01-02 10:00:00 |      701 |      81

score 0 · Accepted Answer

我的逻辑是一个简单的聚合。但是，聚合键是具有数据类型 54 的“下一个”记录，具有相同的设备 ID。

为了获得下一条记录，我在where子句中使用了相关子查询：

select next54 as id, MIN(timestamp) as starttime, MAX(timestamp) as endtime, MAX(device_id) as device_id,
       MAX(case when id = next54 then measure end)
from (select t.*,
             (select MIN(id) from t t2 where t2.id >= t.id and t2.datatype = 54 and t2.device_id = t.device_id) as next54
      from t
     ) t
group by next54

剩下的就是聚合。

因为我个人不是相关子查询的忠实拥护者，所以您也可以使用窗口函数（在 Oracle 中有时称为分析函数）编写此代码：

select next54 as id, MIN(timestamp) as starttime, MAX(timestamp) as endtime, MAX(device_id) as device_id,
       MAX(case when id = next54 then measure end)
from (select t.*,
             min(id54) over (partition by device_id order by id desc) as next54
       from (select t.*,
                    (case when datatype = 54 then id end) as id54
             from t
            ) t
     ) t
group by next54

min带有子句的函数order by执行“累积”最小值。结果应该与相关子查询相同。

score 0 · Accepted Answer

可能我在这里大大简化了问题，但是我看不出为什么对于数据类型为 54 的每条记录，您不能只访问数据类型为 19 且度量为 1 的设备的前一条记录：

SELECT  result.ID, 
        result.DeviceID, 
        MAX(start.Timestamp) StartTime, 
        result.Timestamp EndTime, 
        result.Measure
FROM    T result
        INNER JOIN T start
            ON start.DeviceID = result.DeviceID
            AND start.Timestamp < result.Timestamp
            AND start.DataType = 19
            AND start.Measure = 1
WHERE   result.DataType = 54
GROUP BY result.ID, result.DeviceID, result.Timestamp, result.Measure

唯一真正的区别是，我不是从头开始解决问题并朝着结果努力，而是从结果开始，然后从头开始工作。如果进程同时为同一设备运行，这将失败（即一个事务在前一个事务结束之前开始）

sql - 基于复杂业务逻辑的 SQL Query

3 回答 3

Related

Reference