31

我有一个带有数字和有序因子列的数据框。我有很多 NA 值,因此没有为它们分配级别。我将 NA 更改为“No Answer”,但因子列的级别不包含该级别,所以这是我开始的方式,但我不知道如何以优雅的方式完成它:

addNoAnswer = function(df) {
   factorOrNot = sapply(df, is.factor)
   levelsList = lapply(df[, factorOrNot], levels)
   levelsList = lapply(levelsList, function(x) c(x, "No Answer"))
   ...

有没有办法直接将新级别应用于因子列,例如,如下所示:

df[, factorOrNot] = lapply(df[, factorOrNot], factor, levelsList)

当然,这不能正常工作。

我希望保留级别的顺序并将“No Answer”级别添加到最后一位。

4

5 回答 5

33

levels函数接受levels(x) <- value调用。因此,添加不同的级别非常容易:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
str(f1)
 Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
levels(f1) <- c(levels(f1),"No Answer")
f1[is.na(f1)] <- "No Answer"
str(f1)
 Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...

然后,您可以围绕 data.frame 中的所有变量循环它:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
f2 <- factor(c("c", NA, "b", NA, "b", NA, "c" ,"a", "d", "a", "b"))
f3 <- factor(c(NA, "b", NA, "b", NA, NA, "c", NA, "d" , "e", "a"))
df1 <- data.frame(f1,n1=1:11,f2,f3)

str(df1)
  'data.frame':   11 obs. of  4 variables:
  $ f1: Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
  $ n1: int  1 2 3 4 5 6 7 8 9 10 ...
  $ f2: Factor w/ 4 levels "a","b","c","d": 3 NA 2 NA 2 NA 3 1 4 1 ...
  $ f3: Factor w/ 5 levels "a","b","c","d",..: NA 2 NA 2 NA NA 3 NA 4 5 ...    

for(i in 1:ncol(df1)) if(is.factor(df1[,i])) levels(df1[,i]) <- c(levels(df1[,i]),"No Answer")
df1[is.na(df1)] <- "No Answer"

str(df1)
 'data.frame':   11 obs. of  4 variables:
  $ f1: Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...
  $ n1: int  1 2 3 4 5 6 7 8 9 10 ...
  $ f2: Factor w/ 5 levels "a","b","c","d",..: 3 5 2 5 2 5 3 1 4 1 ...
  $ f3: Factor w/ 6 levels "a","b","c","d",..: 6 2 6 2 6 6 3 6 4 5 ...
于 2017-02-07T20:58:08.497 回答
30

您可以定义一个将级别添加到因子的函数,但只返回其他任何内容:

addNoAnswer <- function(x){
  if(is.factor(x)) return(factor(x, levels=c(levels(x), "No Answer")))
  return(x)
}

然后,您只需lapply将此功能添加到您的列中

df <- as.data.frame(lapply(df, addNoAnswer))

那应该返回你想要的。

于 2014-04-26T21:54:41.213 回答
14

我有一个非常简单的答案,可能无法直接解决您的具体情况,但通常是一种简单的方法

levels(df$column) <- c(levels(df$column), newFactorLevel)
于 2018-12-13T17:38:40.400 回答
6

自从上次回答了这个问题以来fct_explicit_na(),从forcats包中使用这已经成为可能。我在这里添加了文档中给出的示例。

f1 <- factor(c("a", "a", NA, NA, "a", "b", NA, "c", "a", "c", "b"))
table(f1)

# f1
# a b c 
# 4 2 2 

f2 <- forcats::fct_explicit_na(f1)
table(f2)

# f2
#     a         b         c (Missing) 
#     4         2         2         3 

默认值为,(Missing)但这可以通过na_level参数进行更改。

于 2016-10-12T21:05:44.910 回答
4

扩展ilir 的答案及其评论,您可以检查列是否是一个因素并且它尚未包含新级别,然后添加级别,从而使函数可重新运行:

addLevel <- function(x, newlevel=NULL) {
  if(is.factor(x)) {
    if (is.na(match(newlevel, levels(x))))
      return(factor(x, levels=c(levels(x), newlevel)))
  }
  return(x)
}

然后,您可以像这样应用它:

dataFrame$column <- addLevel(dataFrame$column, "newLevel")
于 2018-01-26T01:07:14.880 回答