r - 在 dplyr 包中使用汇总和交叉，同时区分数字和非数字列

Question

我想对dplyr看起来像这样的数据集执行一些操作：

data <- data.frame(day = c(rep(1, 15), rep(2, 15)), nweek = rep(rep(1:5, 3),2), 
                   firm = rep(sapply(letters[1:3], function(x) rep(x, 5)), 2), 
                   quant = rnorm(30), price = runif(30) )

每个观察都在日、周和公司级别（一周只有 2 天）。

我想firm通过（1）在一周中的几天（即和）中取平均值来总结数据（按分组），并为非数字across变量取第一个条目（在这种情况下是只有，但在我的真实数据集中，我有多个不是数字的变量（和），它们可能会在一周内发生变化（），所以我只想在一周的第一天输入所有非数字变量。numericquantpricefirmDatecharacternweek

我尝试使用summarise但across出现错误

> data %>% group_by(firm, nweek) %>% dplyr::summarise(across(which(sapply(data, is.numeric)), ~ mean(.x, na.rm = TRUE)),
+                           across(which(sapply(data, !(is.numeric))), ~ head(.x, 1))
+ )
Error: Problem with `summarise()` input `..2`.
x invalid argument type
ℹ Input `..2` is `across(which(sapply(data, !(is.numeric))), ~head(.x, 1))`.
Run `rlang::last_error()` to see where the error occurred.

有什么帮助吗？

score 1 · Accepted Answer

我不知道您的预期输出应该是什么样子，但是这样的事情可能会达到您想要实现的目标

data %>%
  group_by(firm, nweek) %>% 
  summarise(
    across(where(is.numeric), ~ mean(.x, na.rm = TRUE)),
    across(!where(is.numeric), ~ head(.x, 1))
)

作为旁注，which(sapply(...))请查看这篇文章where中用于条件选择变量的助手，而不是使用。across

输出

# A tibble: 15 x 5
# Groups:   firm [3]
   firm  nweek   day   quant price
   <chr> <int> <dbl>   <dbl> <dbl>
 1 a         1   1.5 -0.336  0.903
 2 a         2   1.5  0.0837 0.579
 3 a         3   1.5  0.0541 0.425
 4 a         4   1.5  1.21   0.555
 5 a         5   1.5  0.462  0.806
 6 b         1   1.5  0.0493 0.346
 7 b         2   1.5  0.635  0.596
 8 b         3   1.5  0.406  0.583
 9 b         4   1.5 -0.707  0.205
10 b         5   1.5  0.157  0.816
11 c         1   1.5  0.728  0.271
12 c         2   1.5  0.117  0.775
13 c         3   1.5 -1.05   0.234
14 c         4   1.5 -1.35   0.290
15 c         5   1.5  0.771  0.310

r - 在 dplyr 包中使用汇总和交叉，同时区分数字和非数字列

1 回答 1

Related

Reference