sql - SQL 查询：多重挑战

Question

不是 SQL 专家，我正在努力解决以下问题：

我继承了一个大型表（大约 1 亿行），其中包含时间戳事件，这些事件代表大多数短暂现象的阶段转换。不幸的是，这些事件的记录方式有点奇怪，表格如下：

phen_ID   record_time  producer_id   consumer_id  state   ...

000123    10198789                               start
          10298776     000123        000112      hjhkk
000124    10477886                               start
          10577876     000124        000123      iuiii
000124    10876555                               end

每个现象（phen-ID）都有一个开始事件和一个理论上的结束事件，尽管它可能还没有发生，因此没有记录下来。然后，每种现象都可以经历几种状态。不幸的是，对于某些州，ID 记录在产品或消费者字段中。此外，状态的数量不是固定的，状态之间的时间也不是固定的。

首先，我需要创建一个 SQL 语句，为每个 phen-ID 显示开始时间和最后记录事件的时间（可能是结束状态或中间状态之一）。

仅考虑单个 phen-ID，我设法将以下 SQL 组合在一起：

WITH myconstants (var1) as (
   values ('000123')
)

select min(l.record_time), max(l.record_time) from 
   (select distinct *  from public.phen_table JOIN myconstants ON var1 IN (phen_id, producer_id, consumer_id)
 ) as l

由于对于特定现象，开始状态始终具有最低的记录时间，因此上述语句正确地将记录的时间范围返回为一行，而不管结束状态是什么。

显然，在这里我必须手动提供 phen-ID。

我怎样才能完成这项工作，以便为每个唯一的 phen-ID 获得一行开始时间和最大记录时间？玩弄试图适应类似的东西，select distinct phen-id ...但无法自动将它们“喂”到上面。还是我在这里完全不合时宜？

另外：为了澄清，使用上表的理想输出应该是这样的：

ID         min-time      max-time
000123     10198789      10577876   (min-time is start, max-time is state iuii)
000124     10477886      10876555   (min-time is start, max-time is end state)

score 1 · Accepted Answer

我认为你在正确的轨道上。试试这个，看看它是否是你要找的：

select
    min(l.record_time)
    ,max(l.record_time)
    ,coalesce(phen_id, producer_id, consumer_id) as [Phen ID]
from public.phen_table
group by coalesce(phen_id, producer_id, consumer_id)

score 1 · Accepted Answer

union all可能是一个选择：

select phen_id, 
    min(record_time) as min_record_time, 
    max(record_time) as max_record_time
from (
    select phen_id, record_time from phen_table
    union all select producer_id, record_time from phen_table
    union all select consumer_id, record_time from phen_table
) t
where phen_id is not null
group by phen_id

另一方面，如果你想要优先级，那么你可以使用coalesce()：

select coalesce(phen_id, producer_id, consumer_id) as phen_id, 
    min(record_time) as min_record_time, 
    max(record_time) as max_record_time
from phen_table
group by coalesce(phen_id, producer_id, consumer_id)

这两个查询的逻辑并不完全相同。如果存在三列中不止一个为 notnull且值不同的行，则第一个查询考虑所有非null值，而第二个查询仅考虑“第一个”非null值。

编辑

在您最终标记的 Postgres 中，union all可以使用横向连接更有效地表述解决方案：

select x.phen_id, 
    min(p.record_time) as min_record_time, 
    max(p.record_time) as max_record_time
from phen_table p
cross join lateral (values (phen_id), (producer_id), (consumer_id)) as x(phen_id)
where x.phen_id is not null
group by x.phen_id

sql - SQL 查询：多重挑战

2 回答 2

Related

Reference