我有一个包含一组对象 ID、数值和日期的表(我们称之为数据)。我想确定在过去 X 分钟(例如一小时)内其值具有正趋势的对象。
示例数据:
entity_id | value | date
1234 | 15 | 2014-01-02 11:30:00
5689 | 21 | 2014-01-02 11:31:00
1234 | 16 | 2014-01-02 11:31:00
我尝试查看类似的问题,但不幸的是没有找到任何帮助...
您启发了我在 SQL Server 中实现线性回归。这可以为 MySQL/Oracle/Whatever 修改,而不会有太多麻烦。这是确定每个 entity_id 一小时内趋势的最佳数学方法,它将仅选择具有积极趋势的那些。
它实现了此处列出的计算 B1hat 的公式:https ://en.wikipedia.org/wiki/Regression_analysis#Linear_regression
create table #temp
(
entity_id int,
value int,
[date] datetime
)
insert into #temp (entity_id, value, [date])
values
(1,10,'20140102 07:00:00 AM'),
(1,20,'20140102 07:15:00 AM'),
(1,30,'20140102 07:30:00 AM'),
(2,50,'20140102 07:00:00 AM'),
(2,20,'20140102 07:47:00 AM'),
(3,40,'20140102 07:00:00 AM'),
(3,40,'20140102 07:52:00 AM')
select entity_id, 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar)) as Beta
from
(
select entity_id,
avg(value) over(partition by entity_id) as ybar,
value as y,
avg(datediff(second,'20140102 07:00:00 AM',[date])) over(partition by entity_id) as xbar,
datediff(second,'20140102 07:00:00 AM',[date]) as x
from #temp
where [date]>='20140102 07:00:00 AM' and [date]<'20140102 08:00:00 AM'
) as Calcs
group by entity_id
having 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar))>0
如果有人在 Mysql 中需要这个,这是适合我的代码。
datapoint | plays | status_time
1234 | 15 | 2014-01-02 11:30:00
5689 | 21 | 2014-01-02 11:31:00
1234 | 16 | 2014-01-02 11:31:00
select datapoint, 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar)) as Beta
from
(
select datapoint,
avg(plays) over(partition by datapoint) as ybar,
plays as y,
avg(TIME_TO_SEC(TIMEDIFF('2021-03-22 21:00:00', status_time))) over(partition by datapoint) as xbar,
TIME_TO_SEC(TIMEDIFF('2021-03-22 21:00:00', status_time)) as x
from aggregate_datapoints
where status_time BETWEEN'2021-03-22 21:00:00' and '2021-03-22 22:00:00'
and type = 'topContent') as calcs
group by datapoint
having 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar))>0