我正在尝试复制这个 SO 问题,但是通过使用使用该across()
函数的更新语法并远离已弃用的summarise_all()
and funs()
。
起始数据
我有一个数据库提取每种事件类型的一行,如下所示:
library(tidyverse)
library(zoo)
df_start <- tibble(shipment = c(rep("A",4), rep("B",4)),
stop = rep(c(1,1,2,2), 2),
arrive_pickup = as.POSIXct(c("2021-01-01 07:00:00 UTC",NA, NA, NA,"2021-06-05 12:10:00 UTC", NA, NA, NA)),
depart_pickup = as.POSIXct(c(NA,"2021-01-01 08:40:00 UTC", NA, NA, NA, "2021-06-05 16:58:00 UTC", NA, NA)),
arrive_delivery = as.POSIXct(c(NA, NA, "2021-01-05 10:00:00 UTC",NA, NA, NA,"2021-06-08 10:58:00 UTC", NA)),
depart_delivery = as.POSIXct(c(NA, NA, NA, "2021-01-05 11:30:00 UTC",NA, NA, NA,"2021-06-08 13:50:00 UTC"))
)
> df_start
# A tibble: 8 x 6
shipment stop arrive_pickup depart_pickup arrive_delivery depart_delivery
<chr> <dbl> <dttm> <dttm> <dttm> <dttm>
1 A 1 2021-01-01 07:00:00 NA NA NA
2 A 1 NA 2021-01-01 08:40:00 NA NA
3 A 2 NA NA 2021-01-05 10:00:00 NA
4 A 2 NA NA NA 2021-01-05 11:30:00
5 B 1 2021-06-05 12:10:00 NA NA NA
6 B 1 NA 2021-06-05 16:58:00 NA NA
7 B 2 NA NA 2021-06-08 10:58:00 NA
8 B 2 NA NA NA 2021-06-08 13:50:00
期望的结果
...并且我想通过按装运和停靠点,甚至只是按装运分组来折叠行数(我不确定是否留NA
在最终数据框中会影响答案,但我正在寻求成为能够以任何方式解决它)。
df_finish1 # 一个期望的结果
# A tibble: 4 x 6
shipment stop arrive_pickup depart_pickup arrive_delivery depart_delivery
<chr> <dbl> <dttm> <dttm> <dttm> <dttm>
1 A 1 2021-01-01 07:00:00 2021-01-01 08:40:00 NA NA
2 A 2 NA NA 2021-01-05 10:00:00 2021-01-05 11:30:00
3 B 1 2021-06-05 12:10:00 2021-06-05 16:58:00 NA NA
4 B 2 NA NA 2021-06-08 10:58:00 2021-06-08 13:50:00
df_finish2 # 第二个/替代的期望结果
# A tibble: 2 x 5
shipment arrive_pickup depart_pickup arrive_delivery depart_delivery
<chr> <dttm> <dttm> <dttm> <dttm>
1 A 2021-01-01 07:00:00 2021-01-01 08:40:00 2021-01-05 10:00:00 2021-01-05 11:30:00
2 B 2021-06-05 12:10:00 2021-06-05 16:58:00 2021-06-08 10:58:00 2021-06-08 13:50:00
我研究并尝试过的
基于这个 SO question,它确实有效:
df_1 <- df_start %>%
group_by(shipment, stop) %>% # Two groupings
summarise_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE))) %>%
filter(row_number()==n())
> df_1
# A tibble: 4 x 6
# Groups: shipment, stop [4]
shipment stop arrive_pickup depart_pickup arrive_delivery depart_delivery
<chr> <dbl> <dttm> <dttm> <dttm> <dttm>
1 A 1 2021-01-01 07:00:00 2021-01-01 08:40:00 NA NA
2 A 2 NA NA 2021-01-05 10:00:00 2021-01-05 11:30:00
3 B 1 2021-06-05 12:10:00 2021-06-05 16:58:00 NA NA
4 B 2 NA NA 2021-06-08 10:58:00 2021-06-08 13:50:00
df_2 <- df_start %>%
group_by(shipment) %>% # Single grouping
summarise_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE))) %>%
filter(row_number()==n())
> df_2
# A tibble: 2 x 6
# Groups: shipment [2]
shipment stop arrive_pickup depart_pickup arrive_delivery depart_delivery
<chr> <dbl> <dttm> <dttm> <dttm> <dttm>
1 A 2 2021-01-01 07:00:00 2021-01-01 08:40:00 2021-01-05 10:00:00 2021-01-05 11:30:00
2 B 2 2021-06-05 12:10:00 2021-06-05 16:58:00 2021-06-08 10:58:00 2021-06-08 13:50:00
但是我看到的是该summarise_all()
函数和该funs()
函数已被弃用并且不会继续使用,所以我试图了解如何across()
正确使用该函数,但没有成功:
df_3 <- df_start %>%
group_by(shipment) %>%
summarise(across(everything()), na.locf(., na.rm = FALSE, fromLast = FALSE))
> df_3 <- df_start %>%
+ group_by(shipment) %>%
+ summarise(across(everything()), na.locf(., na.rm = FALSE, fromLast = FALSE))
Error: Problem with `summarise()` input `..2`.
x Input `..2` must be size 4 or 1, not 8.
i An earlier column had size 4.
i Input `..2` is `na.locf(., na.rm = FALSE, fromLast = FALSE)`.
i The error occurred in group 1: shipment = "A".
我已经通读了vignette("colwise")
描述差异的内容,并建议我只替换上面显示的语法,但显然我没有做对。帮助?