r - 对数据框属性的逻辑测试如何导致 NA 行

Question

我有一个形式的数据框：

>df
stationid    station      gear sample     lat    lon       date depth
1     25679          CORBOX150    UE4 53.9015 7.8617 15.07.1987    19
2     25681 UE9 Kern CORCRB050    UE9 54.0167 7.3982 15.07.1987    33
3        NA                           54.0167 7.3982 15.07.1987    33

一个合乎逻辑的测试 stationid给了我，在正确的第一行旁边，一个充满 NA 的恼人的行：

> df[df$stationid=="25679",]
stationid station      gear sample     lat    lon       date depth
1      25679         CORBOX150    UE4 53.9015 7.8617 15.07.1987    19
NA        NA    <NA>      <NA>   <NA>      NA     NA       <NA>    NA

这是为什么？

我想，在第 3 行的某个地方df，事情变得一团糟。

数据如下：

df<-structure(list(stationid = c(25679L, 25681L, NA), station = structure(c(2L, 
3L, 1L), .Label = c("", " ", "UE9 Kern"), class = "factor"), 
gear = structure(c(2L, 3L, 1L), .Label = c("", "CORBOX150", 
"CORCRB050"), class = "factor"), sample = structure(c(2L, 
3L, 1L), .Label = c("", "UE4", "UE9"), class = "factor"), 
lat = c(53.9015, 54.0167, 54.0167), lon = c(7.8617, 7.3982, 
7.3982), date = structure(c(1L, 1L, 1L), .Label = "15.07.1987", class = "factor"), 
depth = c(19L, 33L, 33L)), .Names = c("stationid", "station", 
"gear", "sample", "lat", "lon", "date", "depth"), class = "data.frame", row.names = c(NA, 
-3L))

score 2 · Accepted Answer

任何比较都会NA导致结果NA（参见http://cran.r-project.org/doc/manuals/R-intro.html#Missing-values）......您可以使用

df[df$stationid==25679 & !is.na(df$stationid),]

或（如上面评论中所建议的）

df[which(df$stationid==25679),]

或者

subset(df,stationid==25679)

（subset有时会产生不必要的副作用，即删除NA值，但在这种情况下，这正是您想要的）

score 1 · Accepted Answer

1

另一种解决方案是df[df$stationid==25679 & !is.na(df$stationid),]. 更长但更明确。

于 2012-08-24T15:24:06.967 回答

r - 对数据框属性的逻辑测试如何导致 NA 行

2 回答 2

Related

Reference