这是我的代码:
set.seed(1)
#Boruta on the HouseVotes84 data from mlbench
library(mlbench) #has HouseVotes84 data
library(h2o) #has rf
#spin up h2o
myh20 <- h2o.init(nthreads = -1)
#read in data, throw some away
data(HouseVotes84)
hvo <- na.omit(HouseVotes84)
#move from R to h2o
mydata <- as.h2o(x=hvo,
destination_frame= "mydata")
#RF columns (input vs. output)
idxy <- 1
idxx <- 2:ncol(hvo)
#split data
splits <- h2o.splitFrame(mydata,
c(0.8,0.1))
train <- h2o.assign(splits[[1]], key="train")
valid <- h2o.assign(splits[[2]], key="valid")
# make random forest
my_imp.rf<- h2o.randomForest(y=idxy,x=idxx,
training_frame = train,
validation_frame = valid,
model_id = "my_imp.rf",
ntrees=200)
# find importance
my_varimp <- h2o.varimp(my_imp.rf)
my_varimp
我得到的输出是“变量重要性”。
经典的衡量标准是“准确度平均下降”和“基尼系数平均下降”。
我的结果是:
> my_varimp
Variable Importances:
variable relative_importance scaled_importance percentage
1 V4 3255.193604 1.000000 0.410574
2 V5 1131.646484 0.347643 0.142733
3 V3 921.106567 0.282965 0.116178
4 V12 759.443176 0.233302 0.095788
5 V14 492.264954 0.151224 0.062089
6 V8 342.811554 0.105312 0.043238
7 V11 205.392654 0.063097 0.025906
8 V9 191.110046 0.058709 0.024105
9 V7 169.117676 0.051953 0.021331
10 V15 135.097076 0.041502 0.017040
11 V13 114.906586 0.035299 0.014493
12 V2 51.939777 0.015956 0.006551
13 V10 46.716656 0.014351 0.005892
14 V6 44.336708 0.013620 0.005592
15 V16 34.779987 0.010684 0.004387
16 V1 32.528778 0.009993 0.004103
由此,我对“Vote #4”又名 V4 的相对重要性约为 3255.2。
问题: 这是什么单位?那是怎么推导出来的?
我尝试查看文档,但没有找到答案。我尝试了帮助文档。我尝试使用 Flow 查看参数以查看其中是否有任何指示。在他们中,我没有找到“基尼”或“降低准确性”。我应该去哪里看?