我正在使用caret
R 包的随机森林模型进行监督分类。重要性分数是:
> imp
rf variable importance
variables are sorted by average importance across the classes
only 20 most important variables shown (out of 1072)
S1 S2 S3
4431803 65.255 100.00 81.10
875118 98.548 83.17 76.34
DPH5298 76.253 64.65 65.52
L09963 73.734 55.34 62.27
L06919 68.265 36.08 67.35
L01951 29.271 44.96 65.14
SG01650 64.247 62.11 60.36
191797 62.054 51.16 56.09
L01455 21.829 49.09 59.42
DPH1252 47.619 59.38 36.41
SG00716 55.383 52.48 27.83
979261 37.371 54.99 29.40
543491 45.184 53.74 53.49
L00086 53.671 26.54 49.57
SG00379 35.353 23.06 53.66
4430843 52.587 53.65 47.06
L00680 4.569 46.35 53.49
L02770 26.357 42.34 52.95
995149 32.154 48.58 51.63
L00313 32.313 7.67 50.93
但是在我手动按平均重要性分数对特征进行排序后,结果就不同了:
> xx=rowMeans(imp$importance)
> head(sort(xx, decreasing=T), n=10)
875118 4431803 DPH5298 L09963 SG01650 L06919 191797 4430843 543491 DPH1252
86.01983 82.11727 68.80837 63.78019 62.23624 57.23003 56.43629 51.09914 50.80446 47.80281
这是一个错误还是我错过了什么?