r - 散列小标题的每一行

Question

我正在使用新创建的 dplyr 1.0.0 和摘要包来生成小标题中每一行的哈希。

我知道

但我想使用rowwise()dplyr 1.0.0 中的改进。

请参见下面的示例。任何人都知道它为什么失败？我应该被允许消化条目是不同类型的一行。

library(dplyr)
library(digest)

df <- tibble(
    student_id = letters[1:4],
    student_id2 = letters[9:12],
    test1 = 10:13, 
    test2 = 20:23, 
    test3 = 30:33, 
    test4 = 40:43
)

df
#> # A tibble: 4 x 6
#>   student_id student_id2 test1 test2 test3 test4
#>   <chr>      <chr>       <int> <int> <int> <int>
#> 1 a          i              10    20    30    40
#> 2 b          j              11    21    31    41
#> 3 c          k              12    22    32    42
#> 4 d          l              13    23    33    43

dd <- df %>%
    rowwise(student_id) %>%
    mutate(hash = digest(c_across(everything()))) %>%
    ungroup
#> Error: Problem with `mutate()` input `hash`.
#> ✖ Can't combine `student_id2` <character> and `test1` <integer>.
#> ℹ Input `hash` is `digest(c_across(everything()))`.
#> ℹ The error occured in row 1.

### but digest should not care too much about the type of the input

^{由reprex 包（v0.3.0）于 2020-06-04 创建}

score 5 · Accepted Answer

似乎不同的列类型存在问题。一种选择是首先将列类型更改为单个类型，然后执行rowwise

library(dplyr)
library(digest)
df %>%
    mutate(across(everything(), as.character)) %>% 
    rowwise %>%
    mutate(hash = digest(c_across(everything()))) 
# A tibble: 4 x 7
# Rowwise: 
#  student_id student_id2 test1 test2 test3 test4 hash                            
#  <chr>      <chr>       <chr> <chr> <chr> <chr> <chr>                           
#1 a          i           10    20    30    40    2638067de6dcfb3d58b83a83e0cd3089
#2 b          j           11    21    31    41    21162fc0c528a6550b53c87ca0c2805e
#3 c          k           12    22    32    42    8d7539eacff61efbd567b6100227523b
#4 d          l           13    23    33    43    9739997605aa39620ce50e96f1ff4f70

或者另一种选择是unite将列添加到单个列，然后digest在该列上执行

library(tidyr)
df %>% 
   unite(new, everything(), remove = FALSE) %>% 
   rowwise %>%
   mutate(hash = digest(new)) %>%
   select(-new)
# A tibble: 4 x 7
# Rowwise: 
#  student_id student_id2 test1 test2 test3 test4 hash                            
#  <chr>      <chr>       <int> <int> <int> <int> <chr>                           
#1 a          i              10    20    30    40 a9e4cafdfbc88f17b7593dfd684eb2a1
#2 b          j              11    21    31    41 a67a5df8186972285bd7be59e6fdab38
#3 c          k              12    22    32    42 9c20bd87a50642631278b3e6d28ecf68
#4 d          l              13    23    33    43 3f4f373d1969dcf0c8f542023a258225

或者另一种选择是pmap，我们将元素连接到每一行中的一个元素，导致integer转换为characteras vectors 只能包含一个类

library(purrr)
df %>% 
     mutate(hash = pmap_chr(., ~ digest(c(...))))
# A tibble: 4 x 7
#  student_id student_id2 test1 test2 test3 test4 hash                            
#  <chr>      <chr>       <int> <int> <int> <int> <chr>                           
#1 a          i              10    20    30    40 f0fb4100907570ef9bda073b78dc44a6
#2 b          j              11    21    31    41 754b09e8d4d854aa5e40aa88d1edfc66
#3 c          k              12    22    32    42 5f3a699caff833e900fd956232cf61dd
#4 d          l              13    23    33    43 4d31c65284e5db36c37461126a9eb63c

这里的好处是我们没有改变列类型

r - 散列小标题的每一行

1 回答 1

Related

Reference