我正在尝试基于 r 中的随机森林构建分类器。
重现此的代码:
library(quantmod)
library(randomForest)
getSymbols('^GSPC', from="2002-01-01")
GSPC <- GSPC[,1:5] # remove adjusted close
GSPC$wkret <- lag(GSPC$GSPC.Close,-5)/GSPC$GSPC.Close # build weekly future return
GSPC$wkret <- GSPC$wkret * 100 -100 # build index
cutoff <- floor(dim(GSPC)[1]/4) # select the row at 25%
cutoffbreak <- sort(abs(as.data.frame(GSPC$wkret)[,1]),decreasing=T)[cutoff] # get the top 25% return in absolute terms
y <- cut(GSPC$wkret, breaks=c('-100',-cutoffbreak,cutoffbreak ,'100'),labels=c('down','','up')) # build factors
randomForest(GSPC[1:100],y[1:100]) # select first 100 to exclude NA's, dimension problems.
这有效:
y[1:100]
[1] down down down
[22] up up down down up up up up
=== zip ===
> is.factor(y)
[1] TRUE
> x[1:100]
open high low close volume
2002-01-02 1148.08 1154.67 1136.23 1154.67 1171000000
2002-01-03 1154.67 1165.27 1154.01 1165.27 1398900000
2002-01-04 1165.27 1176.55 1163.42 1172.51 1513000000
2002-01-07 1172.51 1176.97 1163.55 1164.89 1308300000
=== zip ===
> class(x)
[1] "xts" "zoo"
这有效(但当然没有意义):
lm(y[1:100] ~ .,data=x[1:100])
但是建立一个随机森林会给出:
> rf <- randomForest(y[1:100] ~ .,data=x[1:100])
Error in randomForest.default(m, y, ...) : subscript out of bounds
> traceback()
4: randomForest.default(m, y, ...)
3: randomForest(m, y, ...)
2: randomForest.formula(y[1:100] ~ ., data = x[1:100])
1: randomForest(y[1:100] ~ ., data = x[1:100])
谷歌搜索说这是一个尺寸问题,但无法弄清楚为什么/如何。
r 版本:
R.version _
platform i686-pc-linux-gnu
arch i686
os linux-gnu
system i686, linux-gnu
status
major 2
minor 15.1
year 2012
month 06
day 22
svn rev 59600
language R
version.string R version 2.15.1 (2012 -06-22) 昵称烤棉花糖
库版本:
randomForest version: "2.15.1"
quantmod version: "2.15.1"