1

我对 R 相当陌生,并且正在自学一些机器学习技术。目前我正在研究超参数调优,为了更好地理解这个问题,我尝试更多地手动完成任务,而不是他们需要的。所以我使用带有列表列的小标题,其中每一行包含训练集交叉验证折叠和随机森林算法的某些超参数值。整个网格包含这些在指定范围内的所有唯一组合。应该通过在所有行上迭代ranger函数(即折叠/参数组合)来构建模型,然后将其保存到列表列中。为此,我使用了 purrr 包的 map 函数系列。

问题是这种方法仅在使用 将数据和单个参数(mtry)映射到ranger函数时才有效map2。我知道pmap在将超过 2 个元素映射到函数时需要使用。但是,与之前描述的两个元素的情况不同,这对我来说不适用于数据和两个参数(mtry 和 min.node.size)作为元素。该pmap函数无法将第三个元素(min.node.size)作为参数映射到ranger函数,并且出现以下错误:

“游侠错误(物种〜.,数据= .x,mtry = .y,min.node.size = .z):找不到对象'.z'”

这是我使用 iris 数据集的代码:

### used packages
library(tidyverse)
library(ranger)
library(rsample)

### data preparation
set.seed(123)

initial_split_data <- initial_split(iris, prop = 0.8)

training <- training(initial_split_data)
testing <- testing(initial_split_data)

cv_split <- vfold_cv(training, v = 3)

cv_data <- cv_split %>% 
  mutate(train = map(.x = splits, .f = ~training(.x)),
         validate = map(.x = splits, .f = ~testing(.x)),
         validate_species = map(.x = validate, .f = ~.x$Species))

### modeling
## two elements being mapped works:
random_forest_model_mtry <- cv_data %>% 
  crossing(mtry = seq(2,4,1)) %>% 
  mutate(model = map2(.x = train, .y = mtry, 
                                    .f = ~ranger(Species ~., data = .x, mtry = .y)))


## three elements being mapped does not work:
random_forest_model_mtry_minnode <- cv_data %>% 
  crossing(mtry = seq(2,4,1),
           min.node.size = seq(1,5,1)) %>% 
  mutate(model = pmap(list(.x = train, .y = mtry, .z = min.node.size), 
                                    .f = ~ranger(Species ~., data = .x, mtry = .y, min.node.size = .z)))

如果有人可以向我展示如何pmap在这种情况下正确使用以便执行随机森林模型,那将非常有帮助。

此致

4

1 回答 1

7

?pmap帮助页面:

 .f: A function, formula, or vector (not necessarily atomic).

     If a *function*, it is used as is.

     If a *formula*, e.g. ‘~ .x + 2’, it is converted to a
     function. There are three ways to refer to the arguments:

       • For a single argument function, use ‘.’

       • For a two argument function, use ‘.x’ and ‘.y’

       • For more arguments, use ‘..1’, ‘..2’, ‘..3’ etc

对于多个参数,我们需要将 , 等替换.x.y,..1..2

random_forest_model_mtry_minnode <- cv_data %>% 
    crossing(mtry = seq(2,4,1),min.node.size = seq(1,5,1)) %>% 
    mutate(model = pmap(list(train, mtry, min.node.size), 
                        .f = ~ranger(Species ~., data = ..1, 
                                     mtry = ..2, min.node.size = ..3)))

请注意,参数列表的元素(list(train, mtry, min.node.size)在您的情况下)可以未命名。重要的是它们的顺序,因为这是由 , 等引用..1..2

于 2019-05-15T22:15:41.817 回答