r - 在多个文件中“分组”相同的列并在每个文件中创建新列

Question

我有大约 20-30 个 dbf 文件，我在 R 中导入了这些文件。我无法将它们组合在一个数据框/表中，因为总文件大小约为 2 GB。我想在每个文件“avg_spends”中按年龄分组创建新列，并在每个文件中创建多个列。

当我将文件合并到一个数据表中，然后使用 dplyr 执行以下命令时。

file_combo <- dbf_file %>% group_by(ctg, age) %>% mutate(avg_spends = 
mean(total_spend)

这只是第一步。同样，我必须根据以前可用/创建的列创建新列。我如何通过第一个 col-files1、files、2 等拆分文件来完成这项工作。

我还需要分别为每个文件输出

这是我拥有的数据的一个示例

files ||   age || ctg || total_spend
==================================
file1 ||    45 ||   1 ||    1026


file1 ||    26 ||   2 ||    1574


file1 ||    45 ||   1 ||    64


file1 ||    32 ||   1 ||    1610


file2 ||    41 ||   1 ||    884


file2 ||    22 ||   1 ||    530


file2 ||    41 ||   2 ||    451


file2 ||    22 ||   1 ||    520


file3 ||    21 ||   2 ||    727


file3 ||    34 ||   1 ||    562


file3 ||    43 ||   2 ||    452


file3 ||    23 ||   1 ||    851

score 0 · Accepted Answer

您可以通过将所有文件存储在列表中并使用对整个列表执行操作来实现此目的lapply()，如下所示：

file1 <- data.frame(age = c(45,26,45,32), ctg = c(1,2,1,1), total_spend = c(1026, 1574, 64, 1610))
file2 <- data.frame(age = c(41,22,41,22), ctg = c(1,1,2,1), total_spend = c(884, 530, 451, 520))
file3 <- data.frame(age = c(21,34,43,23), ctg = c(2,1,2,1), total_spend = c(727, 562, 452, 851))

files <- list(file1, file2, file3)

result <- lapply(files, function(x) x %>% group_by(ctg, age) %>% mutate(avg_spends = mean(total_spend)))

r - 在多个文件中“分组”相同的列并在每个文件中创建新列

1 回答 1

Related

Reference