我以为我在这里找到了我的问题的答案,但是当我使用更大的数据集时,我得到了不同的结果。我怀疑差异是因为na.locf
线路的行为方式。
基本上,我正在将以前使用mutate_at
的代码转换为带有mutate(across())
.
在下面的第一种情况下,数据被正确填充,因为df_initial
仍然按 index_name 分组。在第二种情况下,我假设因为我必须取消分组mutate
across
才能工作,所以我得到了不同的答案。
所以这里有一个更大的数据集的例子来说明这个问题。
可重现的例子:
df_initial <-
structure(list(Date = structure(c(18681, 18681, 18681, 18681,
18682, 18682, 18682, 18682, 18683, 18683, 18683, 18683, 18684,
18684, 18684, 18684, 18685, 18685, 18685, 18685, 18686, 18686,
18686, 18686), class = "Date"), index_name = c("INDU Index",
"SPX Index", "TPX Index", "MEXBOL Index", "INDU Index", "SPX Index",
"TPX Index", "MEXBOL Index", "INDU Index", "SPX Index", "TPX Index",
"MEXBOL Index", "INDU Index", "SPX Index", "TPX Index", "MEXBOL Index",
"INDU Index", "SPX Index", "TPX Index", "MEXBOL Index", "INDU Index",
"SPX Index", "TPX Index", "MEXBOL Index"), index_level = c(31537.35,
3881.37, NA, 45268.33, 31961.86, 3925.43, 1903.07, 45151.38,
31402.01, 3829.34, 1926.23, 44310.27, 30932.37, 3811.15, 1864.49,
44592.91, NA, NA, NA, NA, NA, NA, NA, NA), totalReturn_daily = c(0.0497,
0.1277, 0, 0.7158, 1.3461, 1.1364, -1.8201, -0.1151, -1.7181,
-2.4339, 1.2411, -1.8629, -1.4628, -0.4636, -3.2052, 0.6379,
0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -24L), groups = structure(list(
index_name = c("INDU Index", "MEXBOL Index", "SPX Index",
"TPX Index"), .rows = structure(list(c(1L, 5L, 9L, 13L, 17L,
21L), c(4L, 8L, 12L, 16L, 20L, 24L), c(2L, 6L, 10L, 14L,
18L, 22L), c(3L, 7L, 11L, 15L, 19L, 23L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
下面的第一种方法给出了正确的值,但下面的第二种方法没有。因此,我试图在方法#2 中得到相同的答案,而我在方法#1 中得到相同的答案。
# Approach 1: Expected output received here:
df_initial %>%
mutate_at(vars(-index_name, -totalReturn_daily),
~ na.locf(., na.rm = FALSE)) %>%
filter(index_name == "TPX Index")
# Output
Date index_name index_level totalReturn_daily
<date> <chr> <dbl> <dbl>
1 2021-02-23 TPX Index NA 0
2 2021-02-24 TPX Index 1903. -1.82
3 2021-02-25 TPX Index 1926. 1.24
4 2021-02-26 TPX Index 1864. -3.21
5 2021-02-27 TPX Index 1864. 0
6 2021-02-28 TPX Index 1864. 0
# Approach 2: Did not receive expected output here
df_initial %>%
ungroup() %>%
mutate(across(
.cols = -c(index_name, totalReturn_daily),
.fns = ~ na.locf(., na.rm = FALSE)
)) %>%
filter(index_name == "TPX Index")
# Output
Date index_name index_level totalReturn_daily
<date> <chr> <dbl> <dbl>
1 2021-02-23 TPX Index 3881. 0
2 2021-02-24 TPX Index 1903. -1.82
3 2021-02-25 TPX Index 1926. 1.24
4 2021-02-26 TPX Index 1864. -3.21
5 2021-02-27 TPX Index 44593. 0
6 2021-02-28 TPX Index 44593. 0
谢谢!