18

我正在尝试获取一个数据框(just.samples.with.shoulder.values例如)仅包含具有非NA值的样本。我尝试使用该complete.cases函数来完成此操作,但我想我在下面的语法上做错了:

data <- structure(list(Sample = 1:14, Head = c(1L, 0L, NA, 1L, 1L, 1L, 
0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L), Shoulders = c(13L, 14L, NA, 
18L, 10L, 24L, 53L, NA, 86L, 9L, 65L, 87L, 54L, 36L), Knees = c(1L, 
1L, NA, 1L, 1L, 2L, 3L, 2L, 1L, NA, 2L, 3L, 4L, 3L), Toes = c(324L, 
5L, NA, NA, 5L, 67L, 785L, 42562L, 554L, 456L, 7L, NA, 54L, NA
)), .Names = c("Sample", "Head", "Shoulders", "Knees", "Toes"
), class = "data.frame", row.names = c(NA, -14L))

just.samples.with.shoulder.values <- data[complete.cases(data[,"Shoulders"])]
print(just.samples.with.shoulder.values)

我也很想知道其他路线(subset()例如使用 )是否是一个更明智的想法。非常感谢你的帮忙!

4

3 回答 3

20

您也可以尝试complete.cases返回一个逻辑向量,该向量允许通过以下方式对数据进行子集化Shoulders

data[complete.cases(data$Shoulders), ] 
#    Sample Head Shoulders Knees Toes
#  1      1    1        13     1  324
#  2      2    0        14     1    5
#  4      4    1        18     1   NA
#  5      5    1        10     1    5
#  6      6    1        24     2   67
#  7      7    0        53     3  785
#  9      9    1        86     1  554
# 10     10    1         9    NA  456
# 11     11    1        65     2    7
# 12     12    1        87     3   NA
# 13     13    0        54     4   54
# 14     14    1        36     3   NA
于 2014-03-02T00:12:55.907 回答
17

您可以尝试使用is.na

data[!is.na(data["Shoulders"]),]
   Sample Head Shoulders Knees Toes
1       1    1        13     1  324
2       2    0        14     1    5
4       4    1        18     1   NA
5       5    1        10     1    5
6       6    1        24     2   67
7       7    0        53     3  785
9       9    1        86     1  554
10     10    1         9    NA  456
11     11    1        65     2    7
12     12    1        87     3   NA
13     13    0        54     4   54
14     14    1        36     3   NA
于 2012-09-12T17:48:43.647 回答
0

There is a subtle difference between using is.na and complete.cases. is.na will remove actual na values whereas the objective here is to only control for a variable not deal with missing values/na's those which could be legitimate data points

于 2018-12-13T21:04:53.993 回答