r - R结构中断点时间序列模型的模型比较

Question

我想测试时间序列是否包含结构变化。

使用这个模拟示例创建一个系列，在 30 和 80 次观察后有两次休息。

set.seed(42)
sim_data = data.frame(outcome = c(rnorm(30, 10, 1), rnorm(50, 20, 2), rnorm(20, 45, 1)))
sim_ts = ts(data = sim_data, start = c(2010, 1), frequency = 12)
plot(sim_ts)

我使用strucchange R 包来确定断点的数量（如果有的话）并对这些进行建模：

library("strucchange")
break_points = breakpoints(sim_ts ~ 1) #2 breakpoints at 30 and 80
break_factor = breakfactor(break_points, breaks = 2)
break_model = lm(sim_ts ~ break_factor - 1)

...并将具有 2 个结构变化点的拟合模型放在原始时间序列之上：

lines(fitted(break_points, breaks = 2), col = 4)

我感兴趣的是：如何测试具有结构变化的模型是否比简单的线性模型更适合？

simple_lm = lm(sim_ts ~ time(sim_ts))
abline(simple_lm, col='red') #to add the linear line to the plot

模型比较只是：anova(simple_lm, break_model)？

我不需要先对平稳性进行初始测试吗？或者这是否包含在模型比较中？

score 1 · Accepted Answer

预测文献中评估良好拟合的“正常”方法是使用损失函数 (MSE)。因为您没有进行预测，所以最简单的方法可能就是比较 R²。（如果你只关心一个合适的人）

方差分析方法需要假设观察者的独立性，所以我不确定是否存在可能的陷阱。即使它似乎在这里工作。

score 1 · Accepted Answer

在用于模型比较的许多标准/统计数据中，AIC 和 BIC 是最受欢迎的。以下是 wiki 页面（ https://en.wikipedia.org/wiki/Akaike_information_criterion ）的摘录：

Akaike 信息准则 (AIC) 是预测误差的估计量，因此是给定数据集的统计模型的相对质量。给定数据模型的集合，AIC 估计每个模型相对于其他每个模型的质量。因此，AIC 提供了一种模型选择方法。

在 R 中，通用函数AIC()可用于计算 AIC。（请参阅 wiki 页面中关于计算具有不同功能的 AIC 的软件不可靠性的注释，您的代码并非如此，因为这两个模型都是通过拟合的lm）。AIC 越小，模型越好。〜2.0或类似的差异通常用作确定实际意义的阈值。您的简单模型的 AIC 值远大于中断模型（663.6993 与 380.8516 的差异 >> 2.0）；有压倒性的证据表明，中断模型是更受欢迎的。这并不奇怪，因为许多结构变化检测包通过优化 AIC 或 BIC 精确地寻找最可能的中断数量和位置。

set.seed(42)
sim_data = data.frame(outcome = c(rnorm(30, 10, 1), rnorm(50, 20, 2), rnorm(20, 45, 1)))
sim_ts = ts(data = sim_data, start = c(2010, 1), frequency = 12)
plot(sim_ts)
 

library("strucchange")
break_points = breakpoints(sim_ts ~ 1) #2 breakpoints at 30 and 80
break_factor = breakfactor(break_points, breaks = 2)
break_model = lm(sim_ts ~ break_factor - 1)
simple_lm = lm(sim_ts ~ time(sim_ts))

AIC(simple_lm)   #663.6993
AIC(break_model) #380.8516

或者，经常使用基于贝叶斯的标准。为了自我推销，我演示了使用后验模型可能性来比较使用Rbeast我开发的 R 包的替代模型：

set.seed(42)
sim_data = data.frame(outcome = c(rnorm(30, 10, 1), rnorm(50, 20, 2), rnorm(20, 45, 1)))

library(Rbeast)
break_model   =  beast( sim_data$outcome, season='none')
simple_model  =  beast( sim_data$outcome, season='none', tcp.minmax=c(0,0) )

plot(break_model)
plot(simple_model)

# Due to the Bayesian nature, the exact log-lik numbers will vary to some extent across runs.
break_model$marg_lik  # posterior log-lik: -2.467201
simple_model$marg_lik # posterior log-lik: -137.2492

代码的几点解释：sim_data$outcome是正则时间序列。Rbeast同时进行时间序列分解（如果存在周期性/季节性成分）和断点/变化点检测。您的样本数据没有季节性成分，因此season='none'已设置。断点模型识别两个断点，以及断点随时间的估计概率（峰值概率对应于两个断点）。

为了拟合没有断点的模型，我们通过将断点的最小数量和最大数量都设置为零来强制执行强先验tcp.minmax=c(0,0)：也就是说，不允许更改点，因此拟合了全局线性模型。

每个模型规格的后验边际对数似然在输出break_model$marg_lik和中给出simple_model$marg_lik。显然，marg lik 越大，模型越好。与 AIC 或 BIC 类似，1~2.0 的差异通常表明模型之间具有实际意义。带断点的模型比简单的线性模型要好得多。

r - R结构中断点时间序列模型的模型比较

2 回答 2

Related

Reference