r - 如何根据 mlr3 中的指标列和批量训练预测对任务进行子集化？

Question

背景

我正在使用 R 中的 mlr3 包进行建模和预测。我正在使用一个由测试集和训练集组成的大数据集。测试集和训练集由指示符列指示（在代码中：test_or_train）。

目标

使用数据集中 train_or_test 列指示的训练行批量训练所有学习者。
使用相应的训练有素的学习器批量预测 test_or_train 列中的“测试”指定的行。

代码

带有测试列指示符的占位符数据集。（在实际的数据训练测试拆分不是人为的）
两个任务（在实际代码中任务是不同的，还有更多。）

library(readr)
library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
library(reprex)
library(caret)

# Data
urlfile = 'https://raw.githubusercontent.com/shudras/office_data/master/office_data.csv'
data = read_csv(url(urlfile))[-1]

## Create artificial partition to test and train sets
art_part = createDataPartition(data$imdb_rating, list=FALSE)
train = data[art_part,]
test = data[-art_part,]

## Add test-train indicators
train$test_or_train = 'train'
test$test_or_train = 'test'

## Data set that I want to work / am working with
data = rbind(test, train)

# Create two tasks (Here the tasks are the same but in my data set they differ.)
task1 = 
  TaskRegr$new(
    id = 'office1', 
    backend = data, 
    target = 'imdb_rating'
  )
task2 = 
  TaskRegr$new(
    id = 'office2', 
    backend = data, 
    target = 'imdb_rating'
  )


# Model specification 
graph = 
  po('scale') %>>% 
  lrn('regr.cv_glmnet', 
      id = 'rp', 
      alpha = 1, 
      family = 'gaussian'
  ) 

# Learner creation
learner = GraphLearner$new(graph)

# Goal 
## 1. Batch train all learners with the train rows indicated by the train_or_test column in the data set
## 2. Batch predict the rows designated by the 'test' in the test_or_train column with the respective trained learner

^{由reprex 包于 2020-06-22 创建(v0.3.0)}

笔记

我尝试使用带有 row_ids 的 benchmark_grid 来只用训练行训练学习者，但这不起作用，而且使用列指示符也比使用行索引容易得多。使用列测试训练指示符，可以使用一个规则（用于拆分），而使用行索引仅适用于任务包含相同行的情况。

benchmark_grid(
    tasks = list(task1, task2), 
    learners = learner, 
    row_ids = train_rows # Not an argument and not favorable to work with indices
)

score 6 · Accepted Answer

您可以使用benchmark自定义设计。

以下应该完成这项工作（请注意，我Resampling分别为每个实例实例化一个自定义Task。

library(data.table)
design = data.table(
  task = list(task1, task2),
  learner = list(learner)
)

library(mlr3misc)
design$resampling = map(design$task, function(x) {
  # get train/test split
  split = x$data()[["test_or_train"]]
  # remove train-test split column from the task
  x$select(setdiff(x$feature_names, "test_or_train"))
  # instantiate a custom resampling with the given split
  rsmp("custom")$instantiate(x,
    train_sets = list(which(split == "train")),
    test_sets = list(which(split == "test"))
  )
})

benchmark(design)

您能否更清楚地说明您的意思，batch-processing或者这是否回答了您的问题？

r - 如何根据 mlr3 中的指标列和批量训练预测对任务进行子集化？

背景

目标

代码

笔记

1 回答 1

Related

Reference