r - 为什么 MARS（地球包）会生成这么多预测变量？

Question

我正在使用 R 中的 earth 包研究 MARS 模型。我的数据集 ( CE.Rda) 由一个因变量 ( D9_RTO_avg) 和 10 个潜在预测变量 ( NDVI_l1, NDVI_f0, NDVI_f1, NDVI_f2, NDVI_f3, LST_l1, LST_f0, LST_f1, NDVI_f2, NDVI_f3) 组成。接下来，我向您展示我的数据集的头部

   D9_RTO_avg NDVI_l1 NDVI_f0 NDVI_f1 NDVI_f2 NDVI_f3 LST_l1 LST_f0 LST_f1 LST_f2 LST_f3
2   1.866667  0.3082  0.3290  0.4785  0.4330  0.5844  38.25  30.87     31  21.23  17.92
3   2.000000  0.2164  0.2119  0.2334  0.2539  0.4686   35.7   29.7  28.35  21.67  17.71
4   1.200000  0.2324  0.2503  0.2640  0.2697  0.4726  40.13   33.3  28.95  22.81  16.29
5   1.600000  0.1865  0.2070  0.2104  0.2164  0.3911  43.26  35.79  30.22  23.07  17.88
6   1.800000  0.2757  0.3123  0.3462  0.3778  0.5482  43.99  36.06  30.26  21.36  17.93
7   2.700000  0.2265  0.2654  0.3174  0.2741  0.3590  41.61   35.4  27.51  23.55  18.88_

如下创建我的地球模型后

mymodel.mod <- earth(D9_RTO_avg ~ ., data=CE, nk=10)

我通过键入打印结果模型的摘要

print(summary(mymodel.mod, digits=2, style="pmax"))

我得到以下输出

D9_RTO_avg =
4.1
+   38 * LST_f128.68                        
+  6.3 * LST_f216.41                        
-  2.9 * pmax(0,        0.66 -     NDVI_l1) 
-  2.3 * pmax(0,     NDVI_f3 -        0.23) 

Selected 5 of 7 terms, and 4 of 13169 predictors
Termination condition: Reached nk 10
Importance: LST_f128.68, NDVI_l1, NDVI_f3, LST_f216.41, NDVI_f0-unused,   NDVI_f1-unused, NDVI_f2-unused, ...
Number of terms at each degree of interaction: 1 4 (additive model)
GCV 2    RSS 4046    GRSq 0.29    RSq 0.29

我的问题是为什么地球在实际上是 10 个时识别 13169 个预测变量！？似乎 MARS 正在考虑将候选预测变量的单个观察结果作为预测变量本身。我怎样才能避免 MARS 这样做？

谢谢你的帮助

r - 为什么 MARS（地球包）会生成这么多预测变量？

0 回答 0

Related

Reference