0

我以为我在这里找到了我的问题的答案,但是当我使用更大的数据集时,我得到了不同的结果。我怀疑差异是因为na.locf线路的行为方式。

基本上,我正在将以前使用mutate_at的代码转换为带有mutate(across()).

在下面的第一种情况下,数据被正确填充,因为df_initial仍然按 index_name 分组。在第二种情况下,我假设因为我必须取消分组mutate across才能工作,所以我得到了不同的答案。

所以这里有一个更大的数据集的例子来说明这个问题。

可重现的例子:

df_initial <- 
structure(list(Date = structure(c(18681, 18681, 18681, 18681, 
                                  18682, 18682, 18682, 18682, 18683, 18683, 18683, 18683, 18684, 
                                  18684, 18684, 18684, 18685, 18685, 18685, 18685, 18686, 18686, 
                                  18686, 18686), class = "Date"), index_name = c("INDU Index", 
                                                                                 "SPX Index", "TPX Index", "MEXBOL Index", "INDU Index", "SPX Index", 
                                                                                 "TPX Index", "MEXBOL Index", "INDU Index", "SPX Index", "TPX Index", 
                                                                                 "MEXBOL Index", "INDU Index", "SPX Index", "TPX Index", "MEXBOL Index", 
                                                                                 "INDU Index", "SPX Index", "TPX Index", "MEXBOL Index", "INDU Index", 
                                                                                 "SPX Index", "TPX Index", "MEXBOL Index"), index_level = c(31537.35, 
                                                                                                                                            3881.37, NA, 45268.33, 31961.86, 3925.43, 1903.07, 45151.38, 
                                                                                                                                            31402.01, 3829.34, 1926.23, 44310.27, 30932.37, 3811.15, 1864.49, 
                                                                                                                                            44592.91, NA, NA, NA, NA, NA, NA, NA, NA), totalReturn_daily = c(0.0497, 
                                                                                                                                                                                                             0.1277, 0, 0.7158, 1.3461, 1.1364, -1.8201, -0.1151, -1.7181, 
                                                                                                                                                                                                             -2.4339, 1.2411, -1.8629, -1.4628, -0.4636, -3.2052, 0.6379, 
                                                                                                                                                                                                             0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -24L), groups = structure(list(
                                                                                                                                                                                                               index_name = c("INDU Index", "MEXBOL Index", "SPX Index", 
                                                                                                                                                                                                                              "TPX Index"), .rows = structure(list(c(1L, 5L, 9L, 13L, 17L, 
                                                                                                                                                                                                                                                                     21L), c(4L, 8L, 12L, 16L, 20L, 24L), c(2L, 6L, 10L, 14L, 
                                                                                                                                                                                                                                                                                                            18L, 22L), c(3L, 7L, 11L, 15L, 19L, 23L)), ptype = integer(0), class = c("vctrs_list_of", 
                                                                                                                                                                                                                                                                                                                                                                                     "vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               "tbl_df", "tbl", "data.frame"))

下面的第一种方法给出了正确的值,但下面的第二种方法没有。因此,我试图在方法#2 中得到相同的答案,而我在方法#1 中得到相同的答案。

# Approach 1: Expected output received here:
df_initial %>%
  mutate_at(vars(-index_name, -totalReturn_daily),
            ~ na.locf(., na.rm = FALSE)) %>%
  filter(index_name == "TPX Index")

# Output
  Date       index_name index_level totalReturn_daily
  <date>     <chr>            <dbl>             <dbl>
1 2021-02-23 TPX Index          NA               0   
2 2021-02-24 TPX Index        1903.             -1.82
3 2021-02-25 TPX Index        1926.              1.24
4 2021-02-26 TPX Index        1864.             -3.21
5 2021-02-27 TPX Index        1864.              0   
6 2021-02-28 TPX Index        1864.              0  

# Approach 2: Did not receive expected output here
df_initial %>%
  ungroup() %>%
  mutate(across(
    .cols = -c(index_name, totalReturn_daily),
    .fns  = ~ na.locf(., na.rm = FALSE)
  )) %>%
  filter(index_name == "TPX Index")

# Output
  Date       index_name index_level totalReturn_daily
  <date>     <chr>            <dbl>             <dbl>
1 2021-02-23 TPX Index        3881.              0   
2 2021-02-24 TPX Index        1903.             -1.82
3 2021-02-25 TPX Index        1926.              1.24
4 2021-02-26 TPX Index        1864.             -3.21
5 2021-02-27 TPX Index       44593.              0   
6 2021-02-28 TPX Index       44593.              0  

谢谢!

4

1 回答 1

1

两种方法对我来说都给出了相似的结果。你能试试下面的代码吗?

library(zoo)
df_initial %>%
  group_by(index_name) %>% 
  mutate_at(vars(-index_name, -totalReturn_daily),
            ~ na.locf(., na.rm = FALSE)) %>% 
  dplyr::filter(index_name == "TPX Index") 


df_initial %>%
  group_by(index_name) %>% 
  mutate(across(
    .cols = -c(totalReturn_daily),
    .fns  = ~ na.locf(., na.rm = FALSE)
  )) %>%
  ungroup() %>% 
  dplyr::filter(index_name == "TPX Index")
于 2021-03-04T00:43:02.210 回答