r - R：dplyr::mutate 使用由作为字符串传递的变量组合组成的表达式

Question

我想编写一个向数据框添加新变量的函数。该新变量包含在与参数中传递的一组变量相对应的值的串联中（作为字符串的向量）。在基础 R 中，我会写如下内容：

addConcatFields<-function(data,listOfVar)
{
data$uniqueId=data[,listOfVar[1]]
for(elt in listOfVar[2:length(listOfVar)])
{
data$uniqueId=paste(data$uniqueId,data[,elt],sep='_')
}
return(data)
}

addConcatFields(iris,c('Petal.Width','Species'))

# gives:
      Sepal.Length Sepal.Width Petal.Length Petal.Width Species   uniqueId
1          5.1         3.5          1.4         0.2  setosa 0.2_setosa
2          4.9         3.0          1.4         0.2  setosa 0.2_setosa
...

我最初的目标是使用 dplyr::mutate 来实现它，尽管我阅读了编程小插图http://127.0.0.1:31671/library/dplyr/doc/programming.html，但我没有达到我的目标。因为我想了解我错过的要点，所以我想使用 mutate 解决问题，我将不胜感激。

score 1 · Accepted Answer

解决这个问题的最好方法是使用准引文——这篇文章对解释基本原理很有帮助。

https://dplyr.tidyverse.org/articles/programming.html

最好的选择是将它们存储为带引号的字符串，而不是将列名存储为字符串，因此：

varlist <- rlang::quos('Petal.Width', 'Species')

该行为您提供了 2 个 quosures 的列表 - 一个包含 Petal.Width 列，一个包含 Species 列。

然后你想用！！！将 quosures 列表附加到 dplyr 语句（！！！因为您要拼接多个指令）。

dplyr::select(iris, !!! varlist)

应该会给你想要的结果。

score 0 · Accepted Answer

看看这里unite的功能。它是包含在其中的同一组软件包的一部分。tidyr tidyversedplyr

library(tidyr)
unite(iris,uniqueID,c(Petal.Width,Species))
#    Sepal.Length Sepal.Width Petal.Length       uniqueID
#1            5.1         3.5          1.4     0.2_setosa
#2            4.9         3.0          1.4     0.2_setosa
#3            4.7         3.2          1.3     0.2_setosa
#4            4.6         3.1          1.5     0.2_setosa

如果您不想丢失连接的两列，只需包括remove = F

unite(iris,uniqueID,c(Petal.Width,Species),remove = F)
#    Sepal.Length Sepal.Width Petal.Length       uniqueID Petal.Width    Species
#1            5.1         3.5          1.4     0.2_setosa         0.2     setosa
#2            4.9         3.0          1.4     0.2_setosa         0.2     setosa
#3            4.7         3.2          1.3     0.2_setosa         0.2     setosa
#4            4.6         3.1          1.5     0.2_setosa         0.2     setosa

score 0 · Accepted Answer

好的，在这里考虑一下是另一种解决方案。

使用 match 函数将字符串名称转换为列号。

然后像这样使用列号（将示例中的数字向量替换为匹配的结果）：

df <- tbl_df(df[c(3, 4, 7, 1, 9, 8, 5, 2, 6, 10)])

这也有一个好处，如果 match 返回任何未找到的值，您可以中止该函数并出现错误。

score 0 · Accepted Answer

使用数据表，我做这样的事情

library(data.table)
iris <- data.table(iris)

iris[, uniqueId := do.call(function(...) paste(..., sep = "_"),.SD), .SDcols = c('Petal.Width','Species')]

score 0 · Accepted Answer

添加到其他答案，因为你说你想用 dplyr's 来做mutate。

这是一种方法mutate，使用paste：

iris %>% mutate(uniqueId= paste(Petal.Width, Species, sep = '_'))
# gives the following result:
     Sepal.Length Sepal.Width Petal.Length Petal.Width Species uniqueId
 1          5.1         3.5          1.4         0.2 setosa  0.2_setosa
 2          4.9         3            1.4         0.2 setosa  0.2_setosa
 3          4.7         3.2          1.3         0.2 setosa  0.2_setosa
 4          4.6         3.1          1.5         0.2 setosa  0.2_setosa
 5          5           3.6          1.4         0.2 setosa  0.2_setosa
 6          5.4         3.9          1.7         0.4 setosa  0.4_setosa
 7          4.6         3.4          1.4         0.3 setosa  0.3_setosa
 8          5           3.4          1.5         0.2 setosa  0.2_setosa
 9          4.4         2.9          1.4         0.2 setosa  0.2_setosa
10          4.9         3.1          1.5         0.1 setosa  0.1_setosa
...

如果您的函数是自定义函数，您可以对其进行矢量化，然后使用它。例如，这会导致与上述相同的结果：

concat_fields<-function(var1, var2) {
  return (paste(var1, var2, sep = '_'))
}
v_concat_fields <- Vectorize(concat_fields)
iris %>% mutate(v_concat_fields(Petal.Width, Species))

进入 mutate 的函数将应用于数据帧的列，它具有向量类型的参数，而不是数据帧。

r - R：dplyr::mutate 使用由作为字符串传递的变量组合组成的表达式

5 回答 5

Related

Reference