11

我有一个数据框,其中包含几个包含NaN' 的因子列,我想将它们转换为NA'(这NaN似乎是使用线性回归对象预测新数据的问题)。

> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
> tester1[is.nan(tester1)] = NA
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
> tester1[is.nan(tester1)] = "NA"
> tester1 
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"
4

3 回答 3

21

这就是问题所在:您的向量是模式中的字符,所以它当然是“不是数字”。最后一个元素被解释为字符串“NaN”。is.nan只有当向量是数字时,使用才有意义。如果您想在字符向量中丢失一个值(以便回归函数正确处理它),请使用(不带任何引号)NA_character_,.

> tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_)
>  tester1
[1] "2" "2" "3" "4" "2" "3" NA 
>  is.na(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

字符向量中都没有真正缺少“NA”和“NaN”。如果由于某种原因,因子变量中的值是“NaN”,那么您将能够只使用逻辑索引:

tester1[tester1 == "NaN"] = "NA"  
# but that would not really be a missing value either 
# and it might screw up a factor variable anyway.

tester1[tester1=="NaN"] <- "NA"
Warning message:
In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") :
invalid factor level, NAs generated
##########
tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN))

> tester1[tester1 =="NaN"] <- NA_character_
> tester1
[1] 2    2    3    4    2    3    <NA>
Levels: 2 3 4 NaN

最后的结果可能令人惊讶。有一个剩余的“NaN”级别,但没有一个元素是“NaN”。相反,“NaN”元素现在是一个真正的缺失值,在 print 中表示为 .

于 2012-02-27T22:17:50.917 回答
8

你不能有NaN一个字符向量,这就是你在这里所拥有的:

> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> is.nan(tester1)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> tester1
[1] "2"   "2"   "3"   "4"   "2"   "3"   "NaN"

请注意 R 如何认为这是一个字符串。

您可以NaN在数字向量中创建:

> tester1 <- c("2", "2", "3", "4", "2", "3", NaN)
> as.numeric(tester1)
[1]   2   2   3   4   2   3 NaN
> is.nan(as.numeric(tester1))
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

然后,当然,R 可以根据您的代码转换为NaNNA

> foo <- as.numeric(tester1)
> foo[is.nan(foo)] <- NA
> foo
[1]  2  2  3  4  2  3 NA
于 2012-02-27T22:21:36.283 回答
7

编辑:

Gavin Simpson 在评论中提醒我,在您的情况下,有更简单的方法可以将真正的“NaN”转换为“NA”:

tester1 <- gsub("NaN", "NA", tester1)
tester1
# [1] "2"  "2"  "3"  "4"  "2"  "3"  "NA"

解决方案:

要检测字符向量的哪些元素是NaN,您需要将向量转换为数值向量:

tester1[is.nan(as.numeric(tester1))] <- "NA"
tester1
[1] "2"  "2"  "3"  "4"  "2"  "3"  "NA"

解释:

有几个原因无法按您的预期工作。

首先,虽然NaN代表“Not a Number”,但它确实有 class "numeric",并且只在数字向量内部才有意义。

其次,当它包含在字符向量中时,符号NaN会被默默地转换为字符串"NaN"。然后,当您测试它的nan-ness 时,字符串返回FALSE

class(NaN)
# [1] "numeric"
c("1", NaN)
# [1] "1"   "NaN"
is.nan(c("1", NaN))
# [1] FALSE FALSE
于 2012-02-27T22:12:25.713 回答