按照 Dirk 的建议dynlm
,我不太清楚如何预测,但搜索它导致我通过https://stats.stackexchange.com/questions/6758/1-step-ahead-predictions-with-dynlmdyn
打包-r-包
然后经过几个小时的实验,我想出了以下函数来处理预测。路上有很多“问题”,例如,您似乎无法进行时间序列,并且 predict 的结果被一大堆类似的东西rbind
所抵消,所以我觉得这个答案与仅命名 a 相比显着增加start
包,虽然我赞成德克的回答。
因此,一个可行的解决方案是:
predictDyn 方法:
# pass in training data, test data,
# it will step through one by one
# need to give dependent var name, so that it can make this into a timeseries
predictDyn <- function( model, train, test, dependentvarname ) {
Ntrain <- nrow(train)
Ntest <- nrow(test)
# can't rbind ts's apparently, so convert to numeric first
train[,dependentvarname] <- as.numeric(train[,dependentvarname])
test[,dependentvarname] <- as.numeric(test[,dependentvarname])
testtraindata <- rbind( train, test )
testtraindata[,dependentvarname] <- ts( as.numeric( testtraindata[,dependentvarname] ) )
for( i in 1:Ntest ) {
result <- predict(model,newdata=testtraindata,subset=1:(Ntrain+i-1))
testtraindata[Ntrain+i,dependentvarname] <- result[Ntrain + i + 1 - start(result)][1]
}
return( testtraindata[(Ntrain+1):(Ntrain + Ntest),] )
}
示例用法:
library("dyn")
# size of training and test data
N <- 6
predictN <- 10
# create training data, which we can get exact fit on, so we can check the results easily
traindata <- c(1,2)
for( i in 3:N ) { traindata[i] <- 0.5 + 1.3 * traindata[i-2] + 1.7 * traindata[i-1] }
train <- data.frame( y = ts( traindata ), foo = 1)
# create testing data, bunch of NAs
test <- data.frame( y = ts( rep(NA,predictN) ), foo = 1)
# fit a model
model <- dyn$lm( y ~ lag(y,-1) + lag(y,-2), train )
# look at the model, it's a perfect fit. Nice!
print(model)
test <- predictDyn( model, train, test, "y" )
print(test)
# nice plot
plot(test$y, type='l')
输出:
> model
Call:
lm(formula = dyn(y ~ lag(y, -1) + lag(y, -2)), data = train)
Coefficients:
(Intercept) lag(y, -1) lag(y, -2)
0.5 1.7 1.3
> test
y foo
7 143.2054 1
8 325.6810 1
9 740.3247 1
10 1682.4373 1
11 3823.0656 1
12 8686.8801 1
13 19738.1816 1
14 44848.3528 1
15 101902.3358 1
16 231537.3296 1
编辑:嗯,这虽然超级慢。即使我将数据限制在数据集中subset
的固定几行,每次预测也需要大约 24 毫秒,或者,对于我的任务,0.024*7*24*8*20*10/60/60
= 1.792 hours
:-O