0

我正在尝试实现此处给出的示例:https ://cran.r-project.org/web/packages/multidplyr/vignettes/multidplyr.html

但是,当我需要使用以太方法 1 或 2 对数据进行分区时,出现以下错误。我尝试重新安装 Rcpp 包,但仍然无法正常工作。

qs::qsave (values, path, preset = "fast", check_hash = FALSE, : 函数 'Rcpp_precious_remove' 未由包 'Rcpp' 提供) 中的错误

下面是代码示例:

library(multidplyr)

library(dplyr, warn.conflicts = FALSE)

library(nycflights13)

###Creating a cluster
cluster <- new_cluster(2)

####Method 1. Add dataPartition not working. Investigate why. Use direct method instead
# flights1 <- flights %>% group_by(dest) %>% partition(cluster)

# Method 2 To show how that might work, I’ll first split flights up by month and save as csv files:
path <- tempfile()
dir.create(path)

flights %>% 
  group_by(month) %>% 
  group_walk(~ vroom::vroom_write(.x, sprintf("%s/month-%02i.csv", path, .y$month)))

# Now we find all the files in the directory, and divide them up so that each worker gets (approximately) the same number of pieces:

files <- dir(path, full.names = TRUE)
cluster_assign_partition(cluster, files = files)


# Then we read in the files on each worker and use party_df() to create a partitioned dataframe:

cluster_send(cluster, flights2 <- vroom::vroom(files))

flights2 <- party_df(cluster, "flights2")


###dplyr verbs. 

df <- flights1 %>%
  summarise(dep_delay = mean(dep_delay, na.rm = TRUE)) %>%
  collect()
4

0 回答 0