您至少应该在 SQL 查询中进行基本的过滤和计数。通过一个简单的谓词,Oracle 可以决定是使用索引还是分区,并可能减少数据库处理时间。发送更少的行将显着降低网络开销。
例如:
select trunc(the_time, 'MI') the_minute, count(*) the_count
from test1
where the_time between timestamp '2021-01-25 10:00:00' and timestamp '2021-01-25 11:59:59'
group by trunc(the_time, 'MI')
order by the_minute desc;
(这些查询中最棘手的部分可能是一个问题。你真的想要“10:00 到 12:00 之间”,还是想要“10:00 到 11:59:59 之间”?)
或者,您可以在 SQL 中执行整个计算。我打赌 SQL 版本会稍微快一些,这也是因为网络开销。但是发送一个结果行与 120 个聚合行可能不会产生明显的差异,除非这个查询经常执行。
在这一点上,问题转向了关于将“业务逻辑”放在哪里的更主观的问题。我敢打赌,大多数程序员会更喜欢你的 Python 解决方案而不是我的查询。但是在 SQL 中完成所有工作的一个小优势是将所有奇怪的日期逻辑保存在一个地方。如果您在多个步骤中处理结果,则更有可能出现一次错误。
--Time slots with the smallest number of rows.
--(There will be lots of ties because the data is so boring.)
with dates as
(
--Enter literals or bind variables here:
select
cast(timestamp '2021-01-25 10:00:00' as date) begin_date,
cast(timestamp '2021-01-25 11:59:59' as date) end_date,
30 timeslot
from dual
)
--Choose the rows with the smallest counts.
select begin_time, end_time, total_count
from
(
--Rank the time slots per count.
select begin_time, end_time, total_count,
dense_rank() over (order by total_count) smallest_when_1
from
(
--Counts per timeslot.
select begin_time, end_time, sum(the_count) total_count
from
(
--Counts per minute.
select trunc(the_time, 'MI') the_minute, count(*) the_count
from test1
where the_time between (select begin_date from dates) and (select end_date from dates)
group by trunc(the_time, 'MI')
order by the_minute desc
) counts
join
(
--Time ranges.
select
begin_date + ((level-1)/24/60) begin_time,
begin_date + ((level-1)/24/60) + (timeslot/24/60) end_time
from dates
connect by level <=
(
--The number of different time ranges.
select (end_date - begin_date) * 24 * 60 - timeslot + 1
from dates
)
) time_ranges
on the_minute between begin_time and end_time
group by begin_time, end_time
)
)
where smallest_when_1 = 1
order by begin_time;
你可以在这里运行一个 db<>fiddle 。