r - 使用 Purrr 或 Furrr 进行过滤，并将字符向量传递给附加函数

Question

我有一些非常低效的代码，希望有人可以帮助我。我没有很好的代表，但已经创建了我正在使用的当前代码/工作流程的示例。

这就是我要简洁地做的事情

将数据集过滤成组
将过滤后的数据集传递给 3 个单独的函数（特征选择、验证和应用）

组（过滤）从特征选择到验证再到应用程序都是一致的。
工作流程是：

特征选择采用过滤后的数据并返回每组特征的特征向量
验证采用 2 个参数：按组过滤的数据，以及与来自 1（特征选择）的该组对应的字符向量结果。返回df每个组并仅选择列prediction和linear_weight。然后行绑定
应用程序采用与 2（验证）相同的 2 个参数。为每个组返回df并根据每个组中存在的特征（从 1 开始）选择列，prediction并且linear_weight。然后行绑定

我毫不怀疑，某些版本purrr可能会使我的代码更加高效并显着改善运行时间。我的一个想法是将选择特征的结果保存到 adf中，特征位于列中，然后将该列结果传递给validate_dataandapplicate_data函数。

抱歉没有完全可重现的东西。希望这个例子能很好地了解我想要实现的目标。

library(gapminder)

data <- gapminder_unfiltered

# Filter data
group_1_data <- gapminder_unfiltered %>% 
  filter(country %in% c("Algeria", "Benin"))

group_2_data <- gapminder_unfiltered %>% 
  filter(country == "United States")

group_3_data <- gapminder_unfiltered %>% 
  filter(country %in% c("Italy", "France"))


# Feature Selection
group_1_features <- select_features(group_1_data)
group_2_features <- select_features(group_2_data)
group_3_features <- select_features(group_3_data)

# Example of group_1_features output
c("pop", "gdpPercap")

# Validation
group_1_validation <- validate_data(group_1_data, group_1_features)
group_2_validation <- validate_data(group_2_data, group_2_features)
group_3_validation <- validate_data(group_3_data, group_3_features)

# Row bind Validations selecting only created columns of "prediction" & "linear_weight"
all_validations

# Application: Same Inputs as Validation
group_1_application <- applicate_data(group_1_data, group_1_features)
group_2_application <- applicate_data(group_2_data, group_2_features)
group_3_application <- applicate_data(group_3_data, group_3_features)

# Row bind applications.  Select columns/features that exist in every group based on the feature selection.  Also select columns "prediction" & "linear_weight"
total_applications

score 0 · Accepted Answer

purrr::map 可以使用，因为在一个列表中存储了三个数据帧，然后您可以将结果归约以将它们全部绑定在一起。

library(dplyr)

groups_data=list(group_1_data, group_2_data, group_3_data)

select_features=function(d) {
  features=c()
  if (sample(c(0,1), 1)==0) {
    features=c("pop", "gdpPercap")
  } else {
    features=c("pop", "gdpPercap", "lifeExp")
  }
  return(list(d,
              features))
}

features_list=purrr::map(groups_data, select_features)

validate_data=function(feat_list) {
  d=feat_list[[1]]
  ret=mutate(d, prediction=rnorm(nrow(d)), linear_weight=runif(nrow(d)))
  return (ret)
}

val_list=purrr::map(features_list, validate_data)

Reduce(function(x, y) {
  return(rbind(x, select(y, prediction, linear_weight))) 
}, val_list[2:3], init=select(val_list[[1]], prediction, linear_weight))

applicate_data=function(feat_list) {
  d=feat_list[[1]]
  ret=mutate(d, prediction=rnorm(nrow(d)), linear_weight=runif(nrow(d)))
  return (ret)
}

appl_list=purrr::map(features_list, applicate_data)

Reduce(function(x, y) {
  return(rbind(x, select(y, prediction, linear_weight))) 
}, appl_list[2:3], init=select(appl_list[[1]], prediction, linear_weight))

r - 使用 Purrr 或 Furrr 进行过滤，并将字符向量传递给附加函数

1 回答 1

Related

Reference