我正在尝试实时处理不同链接的点击流。每次点击都会记录到数据库中。对于大多数链接,点击次数/分钟或多或少是恒定的(例如 < 50)。然而,他们中的少数获得 1000-2000/分钟,但只是在很短的时间内。
我想检测何时开始看到如此高流量的点击流,因为我想推迟和批处理这些流的数据库更新,而不是实时执行它们。
我一直在玩一些方法,但没有好的结果。对我来说,这看起来像是一个标准的数学问题或队列管理问题。
有什么建议么?
At the time you insert each click, also calculate the number of clicks over the past minute and insert that. Then you can just query for events where the rate is high enough.
For example (pseudocode):
proc record_click
insert into click_log (current_time, event_info)
insert into click_rates (current_time,
(select count(*) from click_log where time > current_time - 1 minute))
If you don't want to do it at the time of inserting the click, you can calculate that value later, but that will be a potentially huge data set to chew through, rather than just your ~50 records at each click time.
create view click_rates as
select event_time, count(*) as rate
from click_events e1, click_events e2
where e2.event_time between e1.event_time - interval '1 minute' and e1.event_time
group by e1.event_time