r - 重新编码字符字段中的缺失数据

Question

注意：标题可能具有误导性。如果您理解我的问题并想到更具描述性的内容 - 请更改它。

我遇到了一个奇怪的情况，调查的回答都是字符，而不是数字。看来R，真的不喜欢这样。假设我问了一个问题：

Q. In what area do you work? 
East
West
Central
North
South
None of the above

但受访者仅来自东部、西部和中部。

dat <- rep(c("East", "West", "Central"),100)

现在，出于演示目的，重要的是我包括北、南和以上都不是，即使它们都不是。然而，将这些因素考虑在内是具有挑战性的。

咱们试试吧：

fac1 <- factor(dat, labels=c("East","West","Central","North","South","None of the above"))

Error in factor(dat, labels = c("East", "West", "Central", "North", "South",  : 
  invalid labels; length 6 should be 1 or 3

基本上，我想做的是将这些数据与缺失值结合起来。因此，当我输入类似 summary(fac1) 的内容时，它会显示他们在该类别中有 0 个响应。

必须有一个更简单的方法来做到这一点！

score 3 · Accepted Answer

差不多好了。您需要使用以下levels参数：

fac1 <- factor(dat, levels=c("East","West","Central","North","South","None of the above"))
str(fac1)
 Factor w/ 6 levels "East","West",..: 1 2 3 1 2 3 1 2 3 1 ...

levels和之间的区别labels是这样的：

levels定义数据中的因子水平
labels允许您一次性重命名因子水平。

例如：

fac2 <- factor(
  dat, 
  levels=c("East","West","Central","North","South","None of the above"),
  labels=c("E", "W", "C", "N", "S", "Other")
)
str(fac2)
Factor w/ 6 levels "E","W","C","N",..: 1 2 3 1 2 3 1 2 3 1 ...

score 2 · Accepted Answer

不是专家，但这有帮助吗？

fac1 <- factor(dat, levels = 
               c("East","West","Central","North","South","None of the above"))
summary(fac1)

r - 重新编码字符字段中的缺失数据

2 回答 2

Related

Reference