r - In dplyr mutate, how to reference multiple similarly named variables

Question

I have a data.frame similar to this one:

library(tidyverse)
df <- data.frame(
  var_1_a = 1:100,
  var_1_b = 101:200,
  var_two_a = 5:104, 
  var_two_b = 1:100
)
head(df)
  var_1_a var_1_b var_two_a var_two_b
1       1     101         5         1
2       2     102         6         2
3       3     103         7         3
4       4     104         8         4
5       5     105         9         5
6       6     106        10         6

And I want to take the difference of the similarly-named variables. Since there's only two here, that's easy to do with something like:

df %>%
  mutate(var_1_new = var_1_a - var_1_b,
         var_two_new = var_two_a - var_two_b)

But in the real data I have about a hundred of these. What is the easier way of doing this rather than typing them all out?

PS - If it makes it easier, I have a list with all the variables (e.g. mylist <- list("var_1", "var_two")

score 2 · Accepted Answer

You could use the below code. Assumption being, there are only two similarly names variables always.

mylist <- list("var_1", "var_two")
get_similar_names <- function(x) grep(x,names(df))
get_diff <- function(x) Reduce(`-`, subset(df,select=x) )

matches <- lapply(mylist, get_similar_names )
out <- lapply(matches, get_diff)
names(out) <- paste0(mylist,"_new")
out.df <- data.frame(out)

head(out.df)
  var_1_new var_two_new
1      -100           4
2      -100           4
3      -100           4
4      -100           4
5      -100           4
6      -100           4

score 1 · Accepted Answer

One way to do it via base R,

ind <- unique(stringr::word(names(df), 2, sep = '_'))
m1 <- sapply(ind, function(i) Reduce(`-`, (df[stringr::word(names(df), 2, sep = '_') %in% i])))

#which gives,
head(m1)
#     [,1] [,2]
#[1,] -100    4
#[2,] -100    4
#[3,] -100    4
#[4,] -100    4
#[5,] -100    4
#[6,] -100    4

To bring it to your desired output, then,

final_df <- cbind(df, setNames(data.frame(m1), c(paste0('var_', ind, '_new'))))

#  var_1_a var_1_b var_two_a var_two_b var_1_new var_two_new
#1       1     101         5         1      -100           4
#2       2     102         6         2      -100           4
#3       3     103         7         3      -100           4
#4       4     104         8         4      -100           4
#5       5     105         9         5      -100           4
#6       6     106        10         6      -100           4

r - In dplyr mutate, how to reference multiple similarly named variables

2 回答 2

Related

Reference