r - 如何按唯一编号删除多列中的行？

Question

给定这样的数据

C1<-c(3,-999.000,4,4,5)
C2<-c(3,7,3,4,5)
C3<-c(5,4,3,6,-999.000)
DF<-data.frame(ID=c("A","B","C","D","E"),C1=C1,C2=C2,C3=C3)

如何删除所有列中的 -999.000 数据

我知道这适用于每列

DF2<-DF[!(DF$C1==-999.000 | DF$C2==-999.000 | DF$C3==-999.000),]

但我想避免引用每一列。我认为有一种简单的方法可以引用特定数据框中的所有列，即：

DF3<-DF[!(DF[,]==-999.000),]

或者

DF3<-DF[!(DF[,(2:4)]==-999.000),]

但显然这些不起作用

出于好奇，如果你能告诉我为什么我需要在结束方括号之前的最后一个逗号，那么加分，如下所示：

==-999.000),]

score 6 · Accepted Answer

以下可能有效

DF[!apply(DF==-999,1,sum),]

或者如果您可以连续有多个 -999

DF[!(apply(DF==-999,1,sum)>0),]

或者

DF[!apply(DF==-999,1,any),]

score 5 · Accepted Answer

To address your "bonus" question, if we go to the documentation for ?Extract.data.frame we will find:

Data frames can be indexed in several modes. When [ and [[ are used with a single index (x[i] or x[[i]]), they index the data frame as if it were a list. In this usage a drop argument is ignored, with a warning.

and also:

When [ and [[ are used with two indices (x[i, j] and x[[i, j]]) they act like indexing a matrix: [[ can only be used to select one element. Note that for each selected column, xj say, typically (if it is not matrix-like), the resulting column will be xj[i], and hence rely on the corresponding [ method, see the examples section.

So you need the comma to ensure that R knows you are referring to a row, not a column.

score 5 · Accepted Answer

根据您的代码，我假设您要删除所有包含 -999 的行。

DF2 <- DF[rowSums(DF == -999) == 0, ]

至于您的奖励问题：数据框是向量列表，所有向量都具有相同的长度。如果我们将向量视为列，则可以将数据框视为矩阵，其中列可能具有不同的类型（数字、字符等）。R 允许您引用数据框的元素，就像引用矩阵的元素一样；通过使用行和列索引。soDF[i, j]指的是 DF 的第 th 向量i中的第 th 元素，j可以认为是i第 th 行j第 th 列。因此，如果您只想保留数据框的部分行和所有列，则可以使用类似矩阵的表示法：DF[row.indices, ].

score 2 · Accepted Answer

我不明白您的目标是否是删除包含至少一个 NA 的所有行，如果这是您要查找的内容，那么这可能是一个可能的答案：

DF[DF==-999] <- NA
na.omit(DF)
   ID C1 C2 C3
1  A  3  3  5
3  C  4  3  3
4  D  4  4  6

r - 如何按唯一编号删除多列中的行？

4 回答 4

Related

Reference