1

我有一个数据框...

df <- tibble(
  id = 1:5, 
  family = c("a","a","b","b","c"), 
  twin = c(1,2,1,2,1), 
  datacol1 = 11:15, 
  datacol2 = 21:25
  )

对于每一对双胞胎(同一个家庭的成员),我需要第二个“datacol”与其他双胞胎的数据。这应该只发生在匹配的双胞胎中,所以第 5 行(来自“c”族)应该有重复的空列。

理想情况下,到最后数据将如下所示......

df <- tibble(
  id = 1:5, 
  family = c("a","a","b","b","c"), 
  twin = c(1,2,1,2,1), 
  datacol1 = 11:15,
  datacol1.b = c(12,11,14,13,NA),
  datacol2 = 21:25, 
  datacol2.b = c(22,21,24,23,NA)
  )

我添加了一张图片来帮助说明我想要达到的目的。

在此处输入图像描述

我希望能够对所有列或选定的列执行此操作,并且最好使用 tidyverse。

4

2 回答 2

2

我们也可以使用mutate_at

library(dplyr)
df %>% 
    group_by(family) %>%
    mutate_at(vars(starts_with('datacol')), list(`2` = 
           ~if(n() == 1) NA_integer_ else rev(.)))
# A tibble: 5 x 7
# Groups:   family [3]
#     id family  twin datacol1 datacol2 datacol1_2 datacol2_2
#  <int> <chr>  <dbl>    <int>    <int>      <int>      <int>
#1     1 a          1       11       21         12         22
#2     2 a          2       12       22         11         21
#3     3 b          1       13       23         14         24
#4     4 b          2       14       24         13         23
#5     5 c          1       15       25         NA         NA
于 2020-03-04T17:15:57.817 回答
1
cols = c("datacol1", "datacol2")
df %>%
    group_by(family) %>%
    mutate_at(vars(cols), function(x){
        if (n() == 2){
            rev(x)
        } else {
            NA
        }
    }) %>%
    ungroup() %>%
    select(cols) %>%
    rename_all(funs(paste0(., ".b"))) %>%
    cbind(df, .)

碱基R

cols = c("datacol1", "datacol2")
do.call(rbind, lapply(split(df, df$family), function(x){
    cbind(x, setNames(lapply(x[cols], function(y) {
        if (length(y) == 2) {
            rev(y)
        } else {
            NA
        }}),
        paste0(cols, ".b")))
}))
于 2020-03-04T16:52:34.963 回答