0

我想使用tidytableR将以下代码转换为折叠高级和快速数据转换

整洁的代码

library(tidytable)
library(collapse)
Out1 <- 
  wlddev %>% 
  mutate_rowwise.(New1 = sum(c_across.(PCGDP:GINI), na.rm = TRUE))
Out1 %>% 
  select.(New1)
# A tidytable: 13,176 x 1
    New1
   <dbl>
 1  32.4
 2  33.0
 3  33.5
 4  34.0
 5  34.5
 6  34.9
 7  35.4
 8  35.9
 9  36.4
10  36.9
# ... with 13,166 more rows

折叠代码

library(collapse)
Out2 <- 
  wlddev %>% 
  ftransform(New1 = fsum(across(PCGDP:GINI), na.rm = TRUE))

  Error in `context_peek()`:
  ! `across()` must only be used inside dplyr verbs.
  Run `rlang::last_error()` to see where the error occurred.

请有任何提示。

4

3 回答 3

3

我想知道为什么你需要想出如此复杂的东西。你有类似rowSums基础 R 的函数,你有并行的统计函数kit

library(collapse)
library(magrittr)
library(kit, include.only = "psum")  
library(microbenchmark)
  
microbenchmark(
A = wlddev %>%
  ftransform(New1 = rowSums(qM(slt(., PCGDP:GINI)), na.rm = TRUE)),
B = wlddev %>%
  ftransform(New1 = psum(slt(., PCGDP:GINI), na.rm = TRUE)), 
C = wlddev %>%
  ftransform(New1 = psum(PCGDP, LIFEEX, GINI, na.rm = TRUE))
)

#> Unit: microseconds
#>  expr   min      lq      mean   median       uq      max neval
#>     A 68.88 97.8875 194.24037 102.2335 113.8775 4646.366   100
#>     B 25.83 30.1350  35.43548  34.9115  38.6630   56.416   100
#>     C 22.55 25.8095  29.99396  30.5860  32.9025   53.792   100

reprex 包于 2022-02-05 创建(v2.0.1)

于 2022-02-05T00:36:07.127 回答
2

?fsumfrom逐列collapse求和

fsum 是一个通用函数,它计算 x 中所有值的(按列)总和,(可选地)按 g 分组和/或按 w 加权(例如,计算调查总数)。

根据tidytable代码,它是rowwise,因此一个选项是选择 ( slt) 感兴趣的列,t转置,转换为tibble/data.frame并使用fsum并创建一个新列

library(collapse)
Out2 <- wlddev %>%
    slt(PCGDP:GINI) %>%
    t %>%
    as_tibble %>%
    fsum(.) %>% 
    ftransform(wlddev, New1 = .) 

sum当所有元素都存在时返回 0NAfsum默认使用na.rm = TRUE,如果所有元素都存在则返回 NANA

> fsum(c(NA, NA))
[1] NA
> sum(c(NA, NA), na.rm = TRUE)
[1] 0

因此,如果我们NA将第二个数据中的 更改为 0,则输出将与 OP 的“Out1”相同

> Out2$New1[is.na(Out2$New1)] <- 0
> all.equal(Out1, Out2, check.attributes = FALSE)
[1] TRUE
于 2022-02-03T19:02:12.600 回答
0

在@akrun 的回答中,我想出了一个更快的解决方案。

Out3 <- 
  wlddev %>%
  slt(PCGDP:GINI) %>%
  qDT() %>% 
  t %>%
  fsum(.) %>% 
  ftransform(.data = wlddev, New1 = .) %>%
  qDT() %>% 
  replace_NA(X = ., value = 0, cols = "New1")

速度比较

library(microbenchmark)

microbenchmark(
  Out1 = 
    wlddev %>% 
    mutate_rowwise.(New1 = sum(c_across.(PCGDP:GINI), na.rm = TRUE))
, Out2 =
    wlddev %>%
    slt(PCGDP:GINI) %>%
    t %>%
    as_tibble %>%
    fsum(.) %>% 
    ftransform(wlddev, New1 = .)
, Out3 = 
    wlddev %>%
    slt(PCGDP:GINI) %>%
    qDT() %>% 
    t %>%
    fsum(.) %>% 
    ftransform(.data = wlddev, New1 = .) %>%
    qDT() %>% 
    replace_NA(X = ., value = 0, cols = "New1")
)

Unit: microseconds
 expr     min       lq      mean   median       uq      max neval
 Out1 72618.0 78268.75 81296.992 79888.50 81671.10 162397.8   100
 Out2 33549.7 35520.75 37763.537 37728.25 39021.90  55001.3   100
 Out3   241.2   310.85   360.225   357.40   387.35    780.1   100
于 2022-02-04T19:45:12.573 回答