这就是问题所在:您的向量是模式中的字符,所以它当然是“不是数字”。最后一个元素被解释为字符串“NaN”。is.nan
只有当向量是数字时,使用才有意义。如果您想在字符向量中丢失一个值(以便回归函数正确处理它),请使用(不带任何引号)NA_character_
,.
> tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_)
> tester1
[1] "2" "2" "3" "4" "2" "3" NA
> is.na(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE
字符向量中都没有真正缺少“NA”和“NaN”。如果由于某种原因,因子变量中的值是“NaN”,那么您将能够只使用逻辑索引:
tester1[tester1 == "NaN"] = "NA"
# but that would not really be a missing value either
# and it might screw up a factor variable anyway.
tester1[tester1=="NaN"] <- "NA"
Warning message:
In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") :
invalid factor level, NAs generated
##########
tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN))
> tester1[tester1 =="NaN"] <- NA_character_
> tester1
[1] 2 2 3 4 2 3 <NA>
Levels: 2 3 4 NaN
最后的结果可能令人惊讶。有一个剩余的“NaN”级别,但没有一个元素是“NaN”。相反,“NaN”元素现在是一个真正的缺失值,在 print 中表示为 .