0

我正在创建一个函数,它通过数据框压缩并将factor变量传播到新的虚拟变量,因为某些机器学习算法无法处理因子。为此,我使用spread()了清洁功能中的功能。

但是,当我尝试传递需要传播的列的名称时,它会引发错误:

Error: Invalid column specification

这是代码:

library(tidyr)
library(dplyr)    
library(C50) # this is one source for the churn data
data(churn)


f <- function(df, name)  {
  df$dummy <- c(1:nrow(df))       # create dummy variable with unique values

  df <- spread(df, key <- as.character(substitute(name)), "dummy", fill = 0 )
}

churnTrain = f(churnTrain, name = "state")
str(churnTrain)

当然,如果我用它替换key = as.character(substitute(name))它就key = "state"可以了,但是整个函数就失去了可重用性。

如何将列名传递给内部函数而不会出错?

4

2 回答 2

0
library(tidyr)
library(dplyr)    
library(C50) # this is one source for the churn data
data(churn)


f <- function(df, name)  {
  df$dummy <- c(1:nrow(df))       # create dummy variable with unique values

  df <- spread_(df, key = name, "dummy", fill = 0 )
}

churnTrain = f(churnTrain, name = "state")
str(churnTrain)
于 2017-05-15T13:14:04.413 回答
0

你需要使用tidyverse吗?

如果没有,你可以试试旧的reshape2包:


library(reshape2)
library(C50) # this is one source for the churn data
data(churn)

f <- function(df1, name)  {
  df1$dummy <- 1:nrow(df1)  # create dummy variable with unique values
  df1 <- dcast(df1, as.formula(paste0("dummy~", name)))
}

ct1 <- f(churnTrain, name = "state")

如果你绝对需要工作tidyverse,你可以尝试按照http://dplyr.tidyverse.org/articles/programming.html上的教程进行操作。不幸的是,他们的例子在我的机器上不起作用。

于 2017-05-15T12:14:17.673 回答