我正在尝试实现此处给出的示例:https ://cran.r-project.org/web/packages/multidplyr/vignettes/multidplyr.html
但是,当我需要使用以太方法 1 或 2 对数据进行分区时,出现以下错误。我尝试重新安装 Rcpp 包,但仍然无法正常工作。
qs::qsave (values, path, preset = "fast", check_hash = FALSE, : 函数 'Rcpp_precious_remove' 未由包 'Rcpp' 提供) 中的错误
下面是代码示例:
library(multidplyr)
library(dplyr, warn.conflicts = FALSE)
library(nycflights13)
###Creating a cluster
cluster <- new_cluster(2)
####Method 1. Add dataPartition not working. Investigate why. Use direct method instead
# flights1 <- flights %>% group_by(dest) %>% partition(cluster)
# Method 2 To show how that might work, I’ll first split flights up by month and save as csv files:
path <- tempfile()
dir.create(path)
flights %>%
group_by(month) %>%
group_walk(~ vroom::vroom_write(.x, sprintf("%s/month-%02i.csv", path, .y$month)))
# Now we find all the files in the directory, and divide them up so that each worker gets (approximately) the same number of pieces:
files <- dir(path, full.names = TRUE)
cluster_assign_partition(cluster, files = files)
# Then we read in the files on each worker and use party_df() to create a partitioned dataframe:
cluster_send(cluster, flights2 <- vroom::vroom(files))
flights2 <- party_df(cluster, "flights2")
###dplyr verbs.
df <- flights1 %>%
summarise(dep_delay = mean(dep_delay, na.rm = TRUE)) %>%
collect()